In [2]:
import pandas as pd

Introduction¶

We have created a new dataframe from scraped data to avoid the deliberate errors included in the provided dataset and checked it against the plots on the sharedprosperity site. The process of creating and checking this dataframe and imputing missing values is detailed later in this document. First we will look at some of the findings that can help guide further analysis.

The rest of the first half of this document is dumps of various images, code and data¶

But first a brief final word¶

So this is supposed to be structured as a report; It's a progress report. So we're not up to the point of presenting complete conclusions but I've gotten as far as I can without sacrificing another exemption day.

What we've got so far is a dataframe that's usable and hopefully as accurate as you could make it and then we've got an imputed dataframe using one method;

note for next steps; try other imputation methods to compare¶

Apart from that we've got a bunch of factor analysis and component analysis output and some correlation matrices, some pair plots; the general gist of it is that a few factors seem to explain most of this data, that many of these variables are highly correlated; obviously the variables within the dimensions identified by the shared prosperity project are highly correlated with one another.

What else? So we are supposed to take a different direction from that shown in the project online which is a set of eight dimensions so the way I would propose to do something different is by looking at relationships between those dimensions but also looking at smaller numbers of factors or dimensions or variables.

It looks like we can go as low as one, pretty much inequality.

The take-away from all this is that there's a whole lot more work to be done but this one single factor has been getting worse the whole time and everything else comes from it. It's not just financial but all these other factors too.

The closest thing to findings that we have here come in the form of the following plot; apart from the general idea that there are a small number of factors, possibly only one or two that explain all this, there are some factors which seem to be beneficial; high among these is middle class income share and various factors about education, but this could all change if we tried different imputations or factor analysis so we'll see.

image-2.png

that's mostly it for conclusions or introductions.¶

The rest is as is; there are explanations and comments distributed along the way and two main sections:¶

-the rest of this first half i guess exploring the data¶

-and the second half looking at how we got it to where we could do that; ...roughly¶

Below is a scree plot which shows the strength of derived significant factors:

image.png

We also have here some pair plots from an abortive attempt to create like petabytes worth of them from a poorly thought out command

chunk_0.pngchunk_1.png

Marks for presentation said not too many dataframe dumps but i'll do one each here of my scraped dataframe and the imputed one since i'm only giving you this notebook and not any of the csv files ive got in my virtual machine's file system i've been using to do this. Unfortunately I've been running a chromebook the last few weeks because my laptop's away getting fixed so this whole thing has been done on the virtual machine and I've been using its filesystem as an extension of the jupyter interface pretty much.

In [13]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
In [14]:
new_df
Out[14]:
Unnamed: 0 Q5 to Q1 income share ratio D10 to D1 income share ratio D10 to D1-4 income share ratio (Palma) P90 to P10 income ratio P80 to P20 income ratio P80 to P50 income ratio P50 to P20 income ratio GINI Top 10 percent wealth share Top 5 percent wealth share Top 1 percent wealth share Lower deciles income share Middle class income share Unemployment rate Unemployment rate, 60-64 Unemployment rate, 65+ Underemployment rate Percentage of employees working long hours Labour market insecurity Long-term unemployment rate Percentage of youth NEET Percentage of workforce on low pay, OECD definition Percentage of workforce on low pay, relative to minimum wage Minimum to living wage gap Labour share of income Labour productivity to real product wages ratio Percentage of households spending above 30% of income on rental Percentage of households spending above 30% of income on housing House affordability, rent House affordability, purchase Percentage of home ownership Percentage of homelessness in population Percentage of Priority A state housing applicants in population Percentage of Priority B state housing applicants in population Percentage of disposable income spent on household debt Median multiple for housing Health expenditure as a percentage of GDP Health expenditure per capita, PPP Prevalence of depression, adult Prevalence of self-rated health as good or better, adult Prevalence of pyschological distress, adult Prevalence of mood_anxiety disorders, adult Prevalence of healthy weight, adult Prevalence of unmet need for after-hours care due to cost, adult Prevalence of unmet need for GP due to cost, adult Prevalence of adequate vegetable and fruit intake, adult Prevalence of breakfasting at home less than 5 days, child Prevalence of emotional_behavioural problems, child Prevalence of diabetes, adult Prevalence of depression, child Prevalence of good or better parent-rated health, child Prevalence of unfulfilled prescriptions due to cost, child Prevalence of unmet need to after hours care due to cost, child Prevalence of unmet need for GP due to cost, child Prevalence of adequate vegetable and fruit intake, child Prevalence of healthy weight, child Rate of suicides Prevalence of problem gambling interventions Prevalence of poverty 60% ML Prevalence of poverty 50% ML Prevalence of poverty 50% AL Prevalence of poverty 60% AL Prevalence of poverty 40% AL Prevalence of poverty, 60% ML, elderly Prevalence of poverty 50% ML, elderly Prevalence of poverty 50% AL, elderly Prevalence of poverty 60% AL, elderly Poverty risk ratio, 60% AL, single under 65 Poverty risk ratio, 60% AL, solo parent Poverty risk ratio, 50% AL, single under 65 Poverty risk ratio, 50% AL, solo parent Prevalence of poverty, 50% AL, child Prevalence of poverty, 60% AL, child Prevalence of poverty, 40% ML, child Prevalence of poverty, 50% ML, child Prevalence of poverty, 60% ML, child Prevalence of poverty, 60% AL, children with part time working parent_s Prevalence of poverty, 60% AL, children with full time working parent_s Prevalence of personal insolvencies Loan delinquencies Tertiary education participation Education expenditure, GDP Education expenditure, government expenses Tertiary loan as a percentage of income Tertiary loan leaving balance as a percentage of income University fees to income ratio Polytechnic fees to income ratio Wānanga fees to income ratio Degree earnings premium, hourly Diploma_certificate earnings premium, hourly School earnings premium, hourly Degree earnings premium, weekly Diploma_certificate earnings premium, weekly School earnings premium, weekly Percentage of population in remand Percentage of population sentenced Percentage of population post-sentence Incidence of crime victimisation Rate of murder and homicide Regional GDP variation Regional income inadequacy variation Inadequacy Of Income, Gender Inadequacy Of Income, Housing Tenure Inadequacy Of Income, Long-Term Migrant Inadequacy Of Income, Māori Low Income, Gender Gender pay gap
0 1982 4.110000 6.120000 0.910000 3.260000 2.300000 1.510000 0.660000 27.200000 NaN NaN NaN 22.200000 63.300000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.000000 9.000000 12.000000 NaN 6.000000 5.000000 2.000000 4.000000 NaN NaN NaN NaN NaN 18.000000 NaN 8.000000 13.000000 21.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1983 4.125000 6.125000 0.920000 3.285000 2.305000 1.525000 0.665000 27.350000 NaN NaN NaN 22.250000 62.900000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.000000 8.500000 12.500000 NaN 5.500000 4.000000 2.000000 3.500000 NaN NaN NaN NaN NaN 19.500000 NaN 8.000000 13.500000 21.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1984 4.140000 6.130000 0.930000 3.310000 2.310000 1.540000 0.670000 27.500000 NaN NaN NaN 22.300000 62.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.000000 8.000000 13.000000 NaN 5.000000 3.000000 2.000000 3.000000 NaN NaN NaN NaN NaN 21.000000 NaN 8.000000 14.000000 22.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1985 4.090000 6.085000 0.925000 3.240000 2.265000 1.520000 0.675000 27.250000 NaN NaN NaN 22.550000 63.550000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.332 NaN 13.500000 8.000000 12.000000 NaN 4.500000 4.500000 2.000000 3.000000 NaN NaN NaN NaN NaN 19.000000 NaN 7.000000 13.000000 21.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1986 4.040000 6.040000 0.920000 3.170000 2.220000 1.500000 0.680000 27.000000 NaN NaN NaN 22.800000 64.600000 4.2 1.3 2.00 NaN NaN NaN 0.331 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.634 NaN 13.000000 8.000000 11.000000 NaN 4.000000 6.000000 2.000000 3.000000 NaN NaN NaN NaN NaN 17.000000 NaN 6.000000 12.000000 21.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 1987 4.045000 6.070000 0.915000 3.145000 2.220000 1.505000 0.680000 27.050000 NaN NaN NaN 22.600000 64.150000 4.2 1.3 1.70 NaN NaN NaN 0.446 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.015 NaN 13.500000 8.000000 11.500000 NaN 4.500000 6.000000 2.000000 3.500000 NaN NaN NaN NaN NaN 17.000000 NaN 6.500000 12.000000 21.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 1988 4.050000 6.100000 0.910000 3.120000 2.220000 1.510000 0.680000 27.100000 NaN NaN NaN 22.400000 63.700000 5.8 2.3 2.30 NaN NaN NaN 0.782 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.591 NaN 14.000000 8.000000 12.000000 NaN 5.000000 6.000000 2.000000 4.000000 NaN NaN NaN NaN NaN 17.000000 NaN 7.000000 12.000000 21.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN
7 1989 4.240000 6.245000 1.005000 3.285000 2.305000 1.560000 0.675000 28.650000 NaN NaN NaN 22.100000 61.900000 7.3 2.3 3.40 NaN NaN NaN 1.269 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.963 NaN 14.500000 8.000000 12.500000 NaN 4.500000 6.000000 2.000000 3.500000 NaN NaN NaN NaN NaN 18.500000 NaN 6.500000 12.500000 21.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN
8 1990 4.430000 6.390000 1.100000 3.450000 2.390000 1.610000 0.670000 30.200000 NaN NaN NaN 21.300000 59.500000 8.0 3.3 2.10 NaN NaN NaN 1.760 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.532 NaN 15.000000 8.000000 13.000000 NaN 4.000000 6.000000 2.000000 3.000000 NaN NaN NaN NaN NaN 20.000000 NaN 6.000000 13.000000 22.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN
9 1991 4.765000 7.290000 1.140000 3.645000 2.465000 1.640000 0.665000 31.050000 NaN NaN NaN 20.200000 57.600000 10.6 2.8 2.50 NaN NaN NaN 2.578 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 73.298 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.559 NaN 18.000000 10.500000 18.000000 NaN 6.000000 4.500000 1.500000 4.000000 NaN NaN NaN NaN NaN 28.500000 NaN 8.500000 17.000000 27.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.500000 NaN
10 1992 5.100000 8.190000 1.180000 3.840000 2.540000 1.670000 0.660000 31.900000 NaN NaN NaN 20.000000 56.100000 10.7 2.7 1.70 NaN NaN NaN 3.448 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 72.666 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.954 NaN 21.000000 13.000000 23.000000 NaN 8.000000 3.000000 1.000000 5.000000 NaN NaN NaN NaN NaN 37.000000 NaN 11.000000 21.000000 33.000000 NaN NaN NaN 6.474 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.000000 NaN
11 1993 5.100000 8.115000 1.195000 3.900000 2.545000 1.675000 0.660000 32.050000 NaN NaN NaN 19.700000 54.300000 9.8 2.5 1.70 NaN NaN NaN 3.309 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 72.042 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.396 NaN 21.000000 13.500000 24.500000 NaN 8.500000 3.000000 1.000000 5.500000 NaN NaN NaN NaN NaN 39.000000 NaN 12.000000 21.500000 34.000000 NaN NaN NaN 1.380 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.500000 NaN
12 1994 5.100000 8.040000 1.210000 3.960000 2.550000 1.680000 0.660000 32.200000 NaN NaN NaN 19.900000 54.900000 8.4 3.3 2.30 NaN NaN NaN 2.750 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 71.422 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.137 NaN 21.000000 14.000000 26.000000 NaN 9.000000 3.000000 1.000000 6.000000 NaN NaN NaN NaN NaN 41.000000 NaN 13.000000 22.000000 35.000000 NaN NaN NaN -2.502 7.3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.181 17.120 NaN NaN NaN NaN NaN NaN 2.000000 NaN
13 1995 5.155000 8.260000 1.245000 3.895000 2.550000 1.675000 0.655000 32.650000 NaN NaN NaN 19.800000 56.100000 6.5 2.7 1.50 NaN NaN NaN 1.667 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 70.790 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.772 NaN 20.500000 14.000000 24.000000 NaN 9.000000 4.000000 2.000000 6.000000 NaN NaN NaN NaN NaN 37.500000 NaN 13.500000 22.000000 33.500000 NaN NaN NaN -0.184 8.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.309 10.882 NaN NaN NaN NaN NaN NaN 2.000000 NaN
14 1996 5.210000 8.480000 1.280000 3.830000 2.550000 1.670000 0.650000 33.100000 NaN NaN NaN 19.800000 55.200000 6.3 2.9 1.40 NaN NaN NaN 1.333 NaN NaN NaN NaN 0.574 0.983 NaN NaN NaN NaN 70.204 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.462 NaN 20.000000 14.000000 22.000000 NaN 9.000000 5.000000 3.000000 6.000000 NaN NaN NaN NaN NaN 34.000000 NaN 14.000000 22.000000 32.000000 NaN NaN NaN -0.951 8.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.334 14.737 NaN NaN NaN NaN NaN NaN 2.000000 NaN
15 1997 5.230000 8.515000 1.280000 3.775000 2.560000 1.655000 0.645000 33.050000 NaN NaN NaN 18.900000 54.400000 6.8 3.3 1.00 NaN NaN NaN 1.346 NaN 13.270 NaN NaN 0.577 0.974 NaN NaN NaN NaN 69.633 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 14.858 NaN 20.000000 14.000000 20.500000 NaN 9.000000 6.000000 3.500000 6.500000 NaN NaN NaN NaN NaN 32.000000 NaN 14.000000 21.500000 31.500000 NaN NaN NaN 1.803 8.1 4.8 15.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8.129 17.453 NaN NaN NaN NaN NaN NaN 2.000000 NaN
16 1998 5.250000 8.550000 1.280000 3.720000 2.570000 1.640000 0.640000 33.000000 NaN NaN NaN 19.600000 54.900000 7.7 3.4 1.20 NaN NaN NaN 1.508 NaN 13.121 NaN NaN 0.577 0.976 NaN NaN NaN NaN 69.066 NaN NaN NaN 8.3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15.174 NaN 20.000000 14.000000 19.000000 NaN 9.000000 7.000000 4.000000 7.000000 NaN NaN NaN NaN NaN 30.000000 NaN 14.000000 21.000000 31.000000 NaN NaN NaN 3.999 8.2 4.9 15.7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7.855 13.894 NaN NaN NaN NaN NaN NaN 2.000000 16.2
17 1999 5.303333 8.520000 1.300000 3.816667 2.593333 1.653333 0.640000 33.266667 NaN NaN NaN 19.466667 54.333333 7.0 5.4 1.80 NaN NaN NaN 1.485 NaN 12.284 NaN NaN 0.586 0.961 NaN NaN NaN NaN 68.497 NaN NaN NaN 7.7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.473 NaN 21.000000 14.333333 19.666667 NaN 9.000000 7.333333 4.000000 7.000000 NaN NaN NaN NaN NaN 30.666667 NaN 13.666667 22.000000 32.333333 NaN NaN 0.116 2.712 9.4 5.0 15.7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.190 0.188 NaN 7.342 13.038 NaN NaN NaN NaN NaN NaN 2.333333 15.2
18 2000 5.356667 8.490000 1.320000 3.913333 2.616667 1.666667 0.640000 33.533333 NaN NaN NaN 19.333333 53.766667 6.1 4.8 1.70 NaN NaN NaN 1.222 NaN 11.739 NaN NaN 0.566 1.000 NaN NaN NaN NaN 67.919 NaN NaN NaN 9.4 NaN 7.5 1606.3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11.891 NaN 22.000000 14.666667 20.333333 NaN 9.000000 7.666667 4.000000 7.000000 NaN NaN NaN NaN NaN 31.333333 NaN 13.333333 23.000000 33.666667 NaN NaN 0.094 2.259 10.1 5.0 16.4 30.095 46.791 5.560 5.369 3.859 NaN NaN NaN NaN NaN NaN 0.205 0.183 NaN 6.980 14.518 NaN NaN NaN NaN NaN NaN 2.666667 14.0
19 2001 5.410000 8.460000 1.340000 4.010000 2.640000 1.680000 0.640000 33.800000 NaN NaN NaN 19.200000 53.200000 5.5 3.4 1.30 NaN NaN NaN 0.945 NaN 12.229 NaN NaN 0.553 1.027 NaN NaN NaN NaN 67.657 0.737000 NaN NaN 8.5 3.204 7.6 1693.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 13.070 NaN 23.000000 15.000000 21.000000 28.000000 9.000000 8.000000 4.000000 7.000000 NaN NaN NaN NaN NaN 32.000000 41.000000 13.000000 24.000000 35.000000 NaN NaN 0.100 3.110 11.0 5.1 16.8 30.483 45.867 5.552 4.960 1.794 NaN NaN NaN NaN NaN NaN 0.224 0.186 NaN 6.841 13.656 0.540 NaN NaN NaN NaN NaN 3.000000 13.1
20 2002 5.453333 8.633333 1.326667 4.063333 2.673333 1.663333 0.626667 33.666667 NaN NaN NaN 19.133333 53.666667 5.3 4.0 1.70 NaN NaN NaN 0.782 NaN 13.630 NaN NaN 0.547 1.037 NaN NaN NaN NaN 67.468 0.748800 0.036 0.363 9.4 3.301 7.9 1834.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11.895 NaN 22.666667 15.333333 20.000000 26.666667 9.333333 8.333333 4.333333 6.333333 NaN NaN NaN NaN NaN 30.000000 38.666667 13.000000 23.666667 34.000000 NaN NaN 0.097 2.749 11.9 5.0 17.3 28.993 44.553 5.236 4.401 0.885 NaN NaN NaN NaN NaN NaN 0.219 0.176 NaN 6.917 16.714 0.385 NaN NaN NaN NaN NaN 2.000000 12.3
21 2003 5.496667 8.806667 1.313333 4.116667 2.706667 1.646667 0.613333 33.533333 NaN NaN NaN 19.066667 54.133333 4.8 3.7 1.20 NaN NaN NaN 0.650 NaN 13.222 NaN NaN 0.555 1.023 0.306 0.374 0.677 0.764 67.278 0.760600 0.023 0.342 9.1 3.621 7.7 1852.8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.960 NaN 22.333333 15.666667 19.000000 25.333333 9.666667 8.666667 4.666667 5.666667 NaN NaN NaN NaN NaN 28.000000 36.333333 13.000000 23.333333 33.000000 NaN NaN 0.094 7.427 12.5 5.2 17.6 28.239 46.051 4.984 3.371 0.478 NaN NaN NaN NaN NaN NaN 0.225 0.176 0.133 6.802 11.423 0.644 NaN NaN NaN NaN NaN 1.000000 12.5
22 2004 5.540000 8.980000 1.300000 4.170000 2.740000 1.630000 0.600000 33.400000 55.0 41.0 20.0 19.000000 54.600000 4.0 2.5 1.35 3.3 NaN NaN 0.469 10.8 13.162 NaN NaN 0.558 1.016 0.284 0.443 0.663 0.787 67.087 0.772400 0.033 0.372 10.3 4.029 7.9 1985.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.082 NaN 22.000000 16.000000 18.000000 24.000000 10.000000 9.000000 5.000000 5.000000 NaN NaN NaN NaN NaN 26.000000 34.000000 13.000000 23.000000 32.000000 NaN NaN 0.092 3.252 13.1 5.2 18.1 27.533 43.883 5.143 3.051 0.541 NaN NaN NaN NaN NaN NaN 0.244 0.203 0.172 6.188 11.499 0.402 NaN NaN NaN NaN NaN 0.000000 12.7
23 2005 5.450000 8.663333 1.266667 4.116667 2.680000 1.623333 0.610000 32.966667 56.0 42.0 20.5 19.400000 56.100000 3.8 2.1 1.50 2.8 15.652 NaN 0.376 11.1 12.439 NaN NaN 0.561 1.015 0.270 0.501 0.654 0.804 66.898 0.784200 0.027 0.348 11.6 4.482 8.3 2123.6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.452 0.001 21.000000 15.333333 16.666667 22.333333 9.666667 10.000000 6.000000 6.000000 NaN NaN NaN NaN NaN 23.666667 31.000000 12.000000 21.666667 29.666667 NaN NaN 0.098 2.432 13.5 5.1 17.7 27.471 45.621 5.134 3.075 0.601 NaN NaN NaN NaN NaN NaN 0.220 0.220 0.184 6.264 14.755 0.388 NaN NaN NaN NaN NaN 0.000000 14.0
24 2006 5.360000 8.346667 1.233333 4.063333 2.620000 1.616667 0.620000 32.533333 57.0 43.0 21.0 19.800000 57.600000 3.9 1.9 1.70 2.8 14.932 NaN 0.295 10.7 14.612 17.900000 NaN 0.575 0.989 0.264 0.548 0.652 0.819 66.614 0.796000 0.026 0.341 12.3 4.696 8.6 2396.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.520 0.001 20.000000 14.666667 15.333333 20.666667 9.333333 11.000000 7.000000 7.000000 NaN NaN NaN NaN NaN 21.333333 28.000000 11.000000 20.333333 27.333333 NaN NaN 0.099 1.922 13.3 6.1 20.1 27.988 43.394 5.161 3.331 0.575 64.286 33.929 7.143 152.733 92.926 9.003 0.266 0.221 0.180 6.358 11.710 0.469 NaN NaN NaN NaN NaN 0.000000 12.1
25 2007 5.270000 8.030000 1.200000 4.010000 2.560000 1.610000 0.630000 32.100000 56.0 42.0 20.0 20.200000 59.100000 3.6 1.2 1.40 3.1 14.743 3.250 0.213 10.9 13.441 19.933333 NaN 0.576 0.987 0.267 0.632 0.649 0.846 66.312 0.812857 0.025 0.328 13.1 5.000 8.3 2440.0 10.40 89.60 6.60 12.70 36.20 NaN NaN 43.0 7.40 1.8 5.10 0.20 97.60 NaN NaN NaN NaN 67.70 11.855 0.001 19.000000 14.000000 14.000000 19.000000 9.000000 12.000000 8.000000 8.000000 12.0 1.579 2.842 1.786 2.857 19.000000 25.000000 10.000000 19.000000 25.000000 34.0 14.0 0.114 2.870 13.1 5.4 17.2 28.624 43.098 5.204 3.488 0.601 63.421 37.691 7.441 159.615 105.128 7.692 0.275 0.226 0.188 6.183 12.075 0.577 NaN NaN NaN NaN NaN 0.000000 11.9
26 2008 5.320000 8.350000 1.290000 3.990000 2.560000 1.610000 0.630000 33.000000 55.0 41.0 19.0 19.800000 59.000000 4.0 2.1 1.10 3.2 14.154 3.389 0.174 11.1 12.534 21.966667 NaN 0.567 1.009 0.267 0.621 0.641 0.836 66.013 0.829714 0.034 0.300 13.8 4.708 9.1 2718.2 11.18 89.54 6.18 13.42 35.82 NaN NaN 43.3 7.66 2.1 5.18 0.22 97.64 NaN NaN NaN NaN 67.06 12.670 0.001 20.000000 13.000000 12.000000 19.000000 8.000000 12.000000 6.000000 6.000000 11.0 1.250 2.800 1.538 2.846 17.000000 26.000000 10.000000 17.000000 27.000000 42.0 11.0 0.120 8.929 12.4 5.1 16.8 28.846 43.389 5.276 3.652 0.467 67.344 32.066 4.787 151.190 92.262 3.571 0.270 0.191 0.176 6.114 11.972 0.440 NaN NaN NaN NaN NaN 2.000000 12.5
27 2009 5.240000 8.190000 1.270000 3.910000 2.530000 1.580000 0.630000 32.700000 54.5 40.0 18.5 20.000000 57.600000 5.8 2.9 1.40 4.3 13.436 5.233 0.374 14.1 12.873 24.000000 NaN 0.576 0.996 0.278 0.554 0.650 0.810 65.709 0.846571 0.041 0.295 10.8 4.652 9.7 2954.5 11.96 89.48 5.76 14.14 35.44 NaN NaN 43.6 7.92 2.4 5.26 0.24 97.68 NaN NaN NaN NaN 66.42 12.335 0.002 21.000000 14.000000 13.000000 18.000000 9.000000 9.000000 5.000000 5.000000 7.0 1.429 2.571 1.714 2.929 17.000000 25.000000 11.000000 20.000000 30.000000 42.0 14.0 0.175 24.076 12.4 6.0 17.9 27.606 41.846 5.277 3.726 0.564 62.500 25.875 6.250 135.866 103.951 11.246 0.289 0.202 0.189 6.313 15.805 0.543 0.0840 1.10 20.1000 5.70 14.20 1.000000 11.5
28 2010 5.290000 8.250000 1.240000 4.000000 2.550000 1.570000 0.620000 32.300000 54.0 39.0 18.0 20.100000 60.100000 6.1 3.0 1.50 3.9 13.769 5.601 0.546 13.4 12.722 23.966667 NaN 0.556 1.031 0.287 0.573 0.669 0.824 65.405 0.863429 0.052 0.299 10.1 4.777 9.7 3006.4 12.74 89.42 5.34 14.86 35.06 NaN NaN 43.9 8.18 2.7 5.34 0.26 97.72 NaN NaN NaN NaN 65.78 12.428 0.003 21.000000 15.000000 13.000000 18.000000 10.000000 11.000000 5.000000 4.000000 6.0 1.333 2.857 1.667 3.000 19.000000 26.000000 13.000000 23.000000 32.000000 33.0 18.0 0.197 10.734 12.1 6.0 18.3 28.846 42.217 5.462 3.920 0.617 63.789 28.075 5.590 146.951 104.268 4.573 0.291 0.207 0.200 5.986 9.654 0.548 0.0890 1.85 19.6000 4.25 11.75 1.000000 10.8
29 2011 5.910000 9.860000 1.440000 4.410000 2.670000 1.620000 0.610000 35.000000 55.0 40.2 18.8 18.900000 57.500000 6.0 2.9 1.70 4.0 13.285 5.592 0.530 13.2 13.693 23.933333 NaN 0.557 1.031 0.291 0.529 0.678 0.813 65.103 0.880286 0.057 0.254 9.0 4.703 9.6 3106.5 13.52 89.36 4.92 15.58 34.68 NaN NaN 44.2 8.44 3.0 5.42 0.28 97.76 NaN NaN NaN NaN 65.14 12.721 0.003 21.000000 15.000000 14.000000 21.000000 10.000000 9.000000 5.000000 5.000000 8.0 1.524 3.000 1.800 3.000 18.000000 28.000000 11.000000 19.000000 29.000000 34.0 13.0 0.170 7.118 11.0 5.7 16.6 31.948 48.219 5.625 4.116 0.634 60.906 25.149 4.291 146.407 98.204 9.281 0.284 0.199 0.214 5.839 8.440 0.453 0.0940 2.60 19.1000 2.80 9.30 1.000000 10.3
30 2012 5.160000 7.990000 1.220000 3.930000 2.640000 1.680000 0.630000 32.200000 56.0 41.4 19.6 20.100000 58.700000 6.4 3.9 1.60 4.2 13.264 5.885 0.851 13.5 13.555 23.900000 NaN 0.554 1.037 0.292 0.487 0.668 0.794 64.806 0.897143 0.059 0.171 8.7 4.805 9.7 3156.8 14.30 89.30 4.50 16.30 34.30 6.7 13.6 44.5 8.70 3.3 5.50 0.30 97.80 6.6 4.5 4.7 55.5 64.50 12.402 0.003 21.000000 15.000000 14.000000 19.000000 9.000000 9.000000 6.000000 5.000000 8.0 1.238 2.952 1.533 3.600 20.000000 27.000000 12.000000 23.000000 31.000000 35.0 14.0 0.140 6.752 10.8 5.4 16.9 32.055 49.784 5.740 3.996 0.509 59.096 20.137 1.716 143.353 92.486 0.289 0.272 0.183 0.194 5.247 10.209 0.473 0.0990 2.20 21.2805 3.75 10.50 1.000000 9.1
31 2013 5.430000 8.460000 1.320000 4.150000 2.620000 1.660000 0.630000 33.600000 57.0 42.6 20.4 19.700000 58.300000 5.8 3.6 1.60 4.1 14.085 5.333 0.700 11.9 14.080 24.233333 1.793 0.563 1.021 0.294 0.513 0.663 0.795 64.412 0.914000 0.117 0.182 8.5 5.038 9.4 3346.4 14.50 89.50 6.10 16.40 33.70 7.2 14.5 43.3 8.70 4.4 5.80 0.50 98.20 4.5 4.3 6.5 54.9 63.40 12.166 0.003 22.000000 15.000000 12.000000 18.000000 10.000000 10.000000 4.000000 2.000000 7.0 1.318 2.773 1.667 3.133 16.000000 24.000000 13.000000 21.000000 30.000000 39.0 12.0 0.118 3.526 10.6 5.7 17.9 32.636 51.101 5.835 3.935 0.490 59.833 29.167 3.500 162.393 94.017 9.972 0.265 0.182 0.195 5.231 10.806 0.478 0.1040 1.80 23.4610 4.70 11.70 1.000000 11.2
32 2014 5.800000 9.470000 1.370000 4.095000 2.770000 1.680000 0.610000 34.300000 58.0 43.8 21.2 19.000000 56.200000 5.4 2.9 1.60 4.1 14.032 4.764 0.733 11.4 13.865 24.566667 2.047 0.543 1.060 0.296 0.541 0.650 0.801 64.000 NaN 0.192 0.178 9.3 5.301 9.4 3453.3 15.50 91.40 6.20 18.60 33.40 7.0 14.0 41.5 8.40 4.0 5.40 0.70 98.40 3.9 3.9 5.3 53.4 62.60 11.721 0.003 21.000000 16.000000 13.000000 17.000000 10.000000 10.000000 6.000000 4.000000 7.0 1.667 2.952 2.062 3.375 18.000000 25.000000 13.000000 23.000000 31.000000 37.0 16.0 0.100 2.784 10.2 5.2 17.3 33.513 53.722 5.959 3.703 0.475 66.667 31.111 2.778 161.433 110.744 5.785 0.239 0.164 0.191 5.179 9.536 0.595 0.1190 2.85 22.6305 4.95 11.65 1.000000 9.9
33 2015 5.840000 9.690000 1.440000 4.040000 2.620000 1.610000 0.610000 35.000000 59.0 45.0 22.0 19.100000 57.600000 5.4 3.1 1.50 3.9 13.614 4.683 0.707 11.3 13.891 24.900000 2.452 0.557 1.032 0.289 0.516 0.632 0.786 63.585 NaN 0.186 0.168 8.8 5.505 9.3 3530.1 14.60 88.90 6.20 17.40 33.10 5.8 13.7 40.5 8.60 4.0 6.10 1.10 98.00 5.2 3.3 6.1 54.8 63.10 12.263 0.003 21.000000 15.000000 11.000000 16.000000 10.000000 11.000000 6.000000 4.000000 6.0 1.762 2.667 2.133 2.800 14.000000 22.000000 13.000000 21.000000 31.000000 44.0 12.0 0.100 2.861 9.8 5.3 17.8 33.075 53.398 6.058 3.693 0.459 63.135 29.730 2.703 157.105 103.753 7.239 0.248 0.166 0.188 5.538 10.443 0.464 0.1340 3.90 21.8000 5.20 11.60 1.000000 11.8
34 2016 5.520000 8.910000 1.330000 4.160000 2.590000 1.640000 0.630000 33.600000 NaN NaN NaN 19.600000 57.300000 5.1 3.0 1.20 4.3 15.021 4.400 0.721 12.0 11.188 NaN 2.234 0.556 1.033 0.289 0.543 0.614 0.789 63.178 NaN 0.191 0.119 8.0 5.797 NaN NaN 15.40 87.80 6.80 18.80 32.00 6.9 14.3 40.1 10.30 4.3 5.80 0.30 97.70 3.8 4.0 4.5 48.5 63.90 12.328 0.003 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.108 3.776 9.4 5.2 17.8 34.409 56.221 6.200 3.600 0.400 55.321 28.432 2.828 160.976 111.653 16.531 0.282 0.179 0.201 5.674 10.654 0.341 0.1355 3.00 22.1000 5.20 13.75 1.000000 12.0
35 2017 5.700000 9.460000 1.380000 4.060000 2.620000 1.600000 0.610000 34.300000 NaN NaN NaN NaN NaN 4.7 3.1 1.20 4.4 NaN NaN 0.735 11.8 12.092 NaN 3.041 NaN NaN 0.284 0.584 0.601 0.802 62.765 NaN 0.282 0.133 7.9 6.105 NaN NaN 16.70 88.20 7.60 19.90 31.90 6.6 14.3 38.8 9.30 4.9 5.60 0.50 98.10 3.9 2.6 3.0 49.8 62.50 12.636 0.002 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.096 1.048 9.0 NaN NaN 32.167 NaN 6.200 3.700 0.400 51.050 25.000 0.000 170.000 116.216 15.946 0.308 0.192 0.212 5.507 7.301 0.517 0.1370 2.10 22.4000 5.20 15.90 1.000000 9.4
36 2018 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.3 2.6 1.60 4.3 NaN NaN NaN 11.9 NaN NaN 2.810 NaN NaN 0.282 0.590 0.590 0.801 62.355 NaN 0.438 0.156 7.8 6.240 NaN NaN 16.60 87.50 8.60 20.90 31.80 6.9 14.9 39.4 9.30 5.6 5.90 0.60 98.00 3.0 2.4 2.0 50.0 63.50 13.667 0.002 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.088 2.185 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.315 0.179 0.216 3.906 NaN 0.326 NaN NaN NaN NaN NaN NaN 9.2
In [15]:
new_df_imputed
Out[15]:
Unnamed: 0 Q5 to Q1 income share ratio D10 to D1 income share ratio D10 to D1-4 income share ratio (Palma) P90 to P10 income ratio P80 to P20 income ratio P80 to P50 income ratio P50 to P20 income ratio GINI Top 10 percent wealth share Top 5 percent wealth share Top 1 percent wealth share Lower deciles income share Middle class income share Unemployment rate Unemployment rate, 60-64 Unemployment rate, 65+ Underemployment rate Percentage of employees working long hours Labour market insecurity Long-term unemployment rate Percentage of youth NEET Percentage of workforce on low pay, OECD definition Percentage of workforce on low pay, relative to minimum wage Minimum to living wage gap Labour share of income Labour productivity to real product wages ratio Percentage of households spending above 30% of income on rental Percentage of households spending above 30% of income on housing House affordability, rent House affordability, purchase Percentage of home ownership Percentage of homelessness in population Percentage of Priority A state housing applicants in population Percentage of Priority B state housing applicants in population Percentage of disposable income spent on household debt Median multiple for housing Health expenditure as a percentage of GDP Health expenditure per capita, PPP Prevalence of depression, adult Prevalence of self-rated health as good or better, adult Prevalence of pyschological distress, adult Prevalence of mood_anxiety disorders, adult Prevalence of healthy weight, adult Prevalence of unmet need for after-hours care due to cost, adult Prevalence of unmet need for GP due to cost, adult Prevalence of adequate vegetable and fruit intake, adult Prevalence of breakfasting at home less than 5 days, child Prevalence of emotional_behavioural problems, child Prevalence of diabetes, adult Prevalence of depression, child Prevalence of good or better parent-rated health, child Prevalence of unfulfilled prescriptions due to cost, child Prevalence of unmet need to after hours care due to cost, child Prevalence of unmet need for GP due to cost, child Prevalence of adequate vegetable and fruit intake, child Prevalence of healthy weight, child Rate of suicides Prevalence of problem gambling interventions Prevalence of poverty 60% ML Prevalence of poverty 50% ML Prevalence of poverty 50% AL Prevalence of poverty 60% AL Prevalence of poverty 40% AL Prevalence of poverty, 60% ML, elderly Prevalence of poverty 50% ML, elderly Prevalence of poverty 50% AL, elderly Prevalence of poverty 60% AL, elderly Poverty risk ratio, 60% AL, single under 65 Poverty risk ratio, 60% AL, solo parent Poverty risk ratio, 50% AL, single under 65 Poverty risk ratio, 50% AL, solo parent Prevalence of poverty, 50% AL, child Prevalence of poverty, 60% AL, child Prevalence of poverty, 40% ML, child Prevalence of poverty, 50% ML, child Prevalence of poverty, 60% ML, child Prevalence of poverty, 60% AL, children with part time working parent_s Prevalence of poverty, 60% AL, children with full time working parent_s Prevalence of personal insolvencies Loan delinquencies Tertiary education participation Education expenditure, GDP Education expenditure, government expenses Tertiary loan as a percentage of income Tertiary loan leaving balance as a percentage of income University fees to income ratio Polytechnic fees to income ratio Wānanga fees to income ratio Degree earnings premium, hourly Diploma_certificate earnings premium, hourly School earnings premium, hourly Degree earnings premium, weekly Diploma_certificate earnings premium, weekly School earnings premium, weekly Percentage of population in remand Percentage of population sentenced Percentage of population post-sentence Incidence of crime victimisation Rate of murder and homicide Regional GDP variation Regional income inadequacy variation Inadequacy Of Income, Gender Inadequacy Of Income, Housing Tenure Inadequacy Of Income, Long-Term Migrant Inadequacy Of Income, Māori Low Income, Gender Gender pay gap
1982 1982.0 4.110000 6.120000 0.910000 3.260000 2.300000 1.510000 0.660000 27.200000 55.7 41.60 19.80 22.200000 63.300000 5.9 2.1 2.30 3.24 14.5834 4.6130 0.9176 11.58 13.3956 21.546667 2.3134 0.5744 0.9870 0.2692 0.5712 0.6492 0.8230 68.8644 0.813869 0.0306 0.3224 11.04 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.107 0.0012 14.000000 9.000000 12.000000 19.800000 6.000000 5.000000 2.000000 4.000000 8.8 1.4230 2.8140 1.7010 2.9264 18.000000 27.000000 8.000000 13.000000 21.000000 37.0 14.0 0.1094 3.9046 11.02 5.26 17.04 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2442 0.2092 0.1834 6.9278 13.4208 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.500000 13.58
1983 1983.0 4.125000 6.125000 0.920000 3.285000 2.305000 1.525000 0.665000 27.350000 55.7 41.60 19.80 22.250000 62.900000 5.9 2.1 2.30 3.24 14.5834 4.6130 0.9176 11.58 13.3956 21.546667 2.3134 0.5758 0.9818 0.2692 0.5712 0.6492 0.8230 70.1950 0.813869 0.0306 0.3224 11.04 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.107 0.0012 14.000000 8.500000 12.500000 19.800000 5.500000 4.000000 2.000000 3.500000 8.8 1.4230 2.8140 1.7010 2.9264 19.500000 27.000000 8.000000 13.500000 21.500000 37.0 14.0 0.1094 3.4136 10.16 5.26 17.04 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2442 0.2092 0.1834 7.3718 13.9738 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.500000 13.58
1984 1984.0 4.140000 6.130000 0.930000 3.310000 2.310000 1.540000 0.670000 27.500000 55.7 41.60 19.80 22.300000 62.500000 5.9 2.1 2.30 3.24 14.5834 4.6130 0.9176 11.58 13.3456 21.546667 2.3134 0.5780 0.9762 0.2692 0.5712 0.6492 0.8230 70.9734 0.813869 0.0306 0.3224 11.04 4.7076 8.36 2256.92 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.107 0.0012 14.000000 8.000000 13.000000 19.800000 5.000000 3.000000 2.000000 3.000000 8.8 1.4230 2.8140 1.7010 2.9264 21.000000 27.000000 8.000000 14.000000 22.000000 37.0 14.0 0.1094 2.8074 9.38 5.24 16.82 28.6048 44.4586 5.2670 3.7830 1.2206 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2442 0.2092 0.1834 7.5686 14.2394 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.500000 13.58
1985 1985.0 4.090000 6.085000 0.925000 3.240000 2.265000 1.520000 0.675000 27.250000 55.7 41.60 19.80 22.550000 63.550000 5.9 2.1 2.30 3.24 14.5834 4.6130 0.9176 11.58 13.3956 21.546667 2.3134 0.5744 0.9870 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.82 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 10.332 0.0012 13.500000 8.000000 12.000000 19.800000 4.500000 4.500000 2.000000 3.000000 8.8 1.4230 2.8140 1.7010 2.9264 19.000000 27.000000 7.000000 13.000000 21.500000 37.0 14.0 0.1094 3.9046 11.02 5.26 17.04 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2442 0.2092 0.1834 6.9278 13.4208 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.100000 13.34
1986 1986.0 4.040000 6.040000 0.920000 3.170000 2.220000 1.500000 0.680000 27.000000 55.7 41.60 19.80 22.800000 64.600000 4.2 1.3 2.00 3.24 14.5834 4.6130 0.3310 11.58 13.3162 21.546667 2.3134 0.5742 0.9914 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.66 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 12.634 0.0012 13.000000 8.000000 11.000000 19.800000 4.000000 6.000000 2.000000 3.000000 8.8 1.4230 2.8140 1.7010 2.9264 17.000000 27.000000 6.000000 12.000000 21.000000 37.0 14.0 0.1212 8.3592 11.88 5.50 17.54 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2640 0.2120 0.1834 6.5646 13.0912 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.500000 12.84
1987 1987.0 4.045000 6.070000 0.915000 3.145000 2.220000 1.505000 0.680000 27.050000 55.7 41.60 19.80 22.600000 64.150000 4.2 1.3 1.70 3.24 14.5834 4.6130 0.4460 11.58 13.3162 21.546667 2.3134 0.5742 0.9914 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.66 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 14.015 0.0012 13.500000 8.000000 11.500000 19.800000 4.500000 6.000000 2.000000 3.500000 8.8 1.4230 2.8140 1.7010 2.9264 17.000000 27.000000 6.500000 12.000000 21.000000 37.0 14.0 0.1212 8.3592 11.88 5.50 17.54 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2640 0.2120 0.1834 6.5646 13.0912 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.500000 12.84
1988 1988.0 4.050000 6.100000 0.910000 3.120000 2.220000 1.510000 0.680000 27.100000 55.7 41.60 19.80 22.400000 63.700000 5.8 2.3 2.30 3.24 14.5834 4.6130 0.7820 11.58 13.3162 21.546667 2.3134 0.5742 0.9914 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.66 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 14.591 0.0012 14.000000 8.000000 12.000000 19.800000 5.000000 6.000000 2.000000 4.000000 8.8 1.4230 2.8140 1.7010 2.9264 17.000000 27.000000 7.000000 12.000000 21.000000 37.0 14.0 0.1212 8.3592 11.88 5.50 17.54 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2640 0.2120 0.1834 6.5646 13.0912 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.000000 12.84
1989 1989.0 4.240000 6.245000 1.005000 3.285000 2.305000 1.560000 0.675000 28.650000 55.7 41.60 19.80 22.100000 61.900000 7.3 2.3 3.40 3.24 14.5834 4.6130 1.2690 11.58 13.3162 21.546667 2.3134 0.5742 0.9914 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.66 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.963 0.0012 14.500000 8.000000 12.500000 19.800000 4.500000 6.000000 2.000000 3.500000 8.8 1.4230 2.8140 1.7010 2.9264 18.500000 27.000000 6.500000 12.500000 21.500000 37.0 14.0 0.1212 8.3592 11.88 5.50 17.54 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2640 0.2120 0.1834 6.5646 13.0912 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.000000 12.84
1990 1990.0 4.430000 6.390000 1.100000 3.450000 2.390000 1.610000 0.670000 30.200000 55.7 41.60 19.80 21.300000 59.500000 8.0 3.3 2.10 3.24 14.5834 4.6130 1.7600 11.58 13.3956 21.546667 2.3134 0.5744 0.9870 0.2692 0.5712 0.6492 0.8230 68.2606 0.813869 0.0306 0.3224 11.82 4.7076 8.80 2526.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.532 0.0012 15.000000 8.000000 13.000000 19.800000 4.000000 6.000000 2.000000 3.000000 8.8 1.4230 2.8140 1.7010 2.9264 20.000000 27.000000 6.000000 13.000000 22.000000 37.0 14.0 0.1212 3.9046 11.02 5.26 17.04 28.1070 43.4696 5.2104 3.4544 0.5616 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2640 0.2120 0.1834 6.9278 13.4208 0.4834 0.0940 1.91 20.7083 4.24 11.49 0.000000 13.34
1991 1991.0 4.765000 7.290000 1.140000 3.645000 2.465000 1.640000 0.665000 31.050000 55.7 41.60 20.00 20.200000 57.600000 10.6 2.8 2.50 3.26 14.5834 4.6130 2.5780 11.52 12.8088 21.546667 2.3134 0.5760 0.9788 0.2782 0.4996 0.6590 0.8040 73.2980 0.760600 0.0290 0.3532 8.78 3.7274 7.72 1794.56 11.960 89.480 5.760 14.140 35.440 6.72 14.02 43.60 7.920 2.40 5.260 0.240 97.680 4.80 4.00 5.42 53.42 66.420 13.559 0.0012 18.000000 10.500000 18.000000 25.266667 6.000000 4.500000 1.500000 4.000000 8.8 1.4230 2.8140 1.7010 2.9264 28.500000 36.200000 8.500000 17.000000 27.500000 37.0 14.0 0.1002 2.8074 8.38 4.94 16.10 29.0686 45.4290 5.2950 4.2304 1.5114 63.6914 30.9420 5.9824 149.1622 98.4942 8.1586 0.2126 0.1818 0.1714 7.9938 14.0008 0.4718 0.0940 1.91 20.7083 4.24 11.49 0.500000 14.04
1992 1992.0 5.100000 8.190000 1.180000 3.840000 2.540000 1.670000 0.660000 31.900000 55.6 41.28 19.82 20.000000 56.100000 10.7 2.7 1.70 3.40 14.4720 4.7434 3.4480 11.90 12.5286 21.926667 2.3134 0.5760 0.9788 0.2832 0.4706 0.6628 0.7936 72.6660 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 12.428 89.444 5.508 14.572 35.212 6.88 14.14 43.78 8.076 2.58 5.308 0.252 97.704 4.54 3.86 4.80 52.42 66.036 13.954 0.0018 21.000000 13.000000 23.000000 25.266667 8.000000 3.000000 1.000000 5.000000 9.0 1.3848 2.8902 1.6648 3.0606 37.000000 36.200000 11.000000 21.000000 33.000000 35.6 14.0 0.1002 6.4740 7.96 4.96 16.00 29.0686 45.4290 5.2950 4.2304 1.5114 62.2996 28.9962 5.2362 149.8118 98.6024 6.1676 0.2126 0.1818 0.1726 8.1616 14.8172 0.4718 0.1073 2.50 20.9422 4.19 11.39 1.000000 14.16
1993 1993.0 5.100000 8.115000 1.195000 3.900000 2.545000 1.675000 0.660000 32.050000 56.0 41.88 20.22 19.700000 54.300000 9.8 2.5 1.70 3.48 14.7224 4.9456 3.3090 11.62 12.5286 21.926667 2.3134 0.5760 0.9788 0.2832 0.4706 0.6628 0.7936 72.0420 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 13.272 89.096 5.632 15.648 34.448 6.88 14.14 43.14 8.604 3.02 5.432 0.268 97.716 4.54 3.86 4.80 52.42 65.404 12.396 0.0018 21.000000 13.500000 24.500000 25.266667 8.500000 3.000000 1.000000 5.500000 9.0 1.3848 2.8902 1.6648 3.0606 39.000000 36.200000 12.000000 21.500000 34.000000 35.6 14.0 0.1002 1.3800 8.20 4.96 16.00 29.0686 45.4290 5.2950 4.2304 1.5114 60.6060 29.0676 4.6838 152.6168 100.0794 8.5592 0.2126 0.1818 0.1726 8.0590 14.6460 0.4718 0.1073 2.50 20.9422 4.19 11.39 1.500000 14.16
1994 1994.0 5.100000 8.040000 1.210000 3.960000 2.550000 1.680000 0.660000 32.200000 56.0 41.88 20.22 19.900000 54.900000 8.4 3.3 2.30 3.48 14.7224 4.9456 2.7500 11.62 12.5286 21.926667 2.3134 0.5712 0.9890 0.2826 0.4818 0.6520 0.7926 71.4220 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 14.064 88.852 6.084 16.656 33.816 6.88 14.14 42.12 8.828 3.46 5.484 0.316 97.792 4.54 3.86 4.80 52.42 64.748 14.137 0.0018 21.000000 14.000000 26.000000 25.266667 9.000000 3.000000 1.000000 6.000000 8.2 1.4682 2.9206 1.7696 3.1664 41.000000 36.200000 13.000000 22.000000 35.000000 34.6 15.0 0.1002 -2.5020 7.30 4.96 16.00 29.0686 45.4290 5.2950 4.2304 1.5114 58.6348 29.0378 3.8256 157.3354 103.6818 9.8922 0.2126 0.1818 0.1740 8.1810 17.1200 0.4718 0.1109 2.35 20.8961 4.24 12.24 2.000000 14.16
1995 1995.0 5.155000 8.260000 1.245000 3.895000 2.550000 1.675000 0.655000 32.650000 56.0 41.88 20.22 19.800000 56.100000 6.5 2.7 1.50 3.48 14.7224 4.9456 1.6670 11.62 12.5286 21.926667 2.3134 0.5760 0.9788 0.2826 0.4818 0.6520 0.7926 70.7900 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 13.272 89.096 5.632 15.648 34.448 6.88 14.14 43.14 8.604 3.02 5.432 0.268 97.716 4.54 3.86 4.80 52.42 65.404 14.772 0.0018 20.500000 14.000000 24.000000 25.266667 9.000000 4.000000 2.000000 6.000000 8.2 1.4682 2.9206 1.7696 3.1664 37.500000 36.200000 13.500000 22.000000 33.500000 34.6 15.0 0.1002 -0.1840 8.10 4.96 16.00 29.0686 45.4290 5.2950 4.2304 1.5114 60.6060 29.0676 4.6838 152.6168 100.0794 8.5592 0.2126 0.1818 0.1740 8.3090 10.8820 0.4718 0.1109 2.35 20.8961 4.24 12.24 2.000000 14.16
1996 1996.0 5.210000 8.480000 1.280000 3.830000 2.550000 1.670000 0.650000 33.100000 56.0 41.88 20.22 19.800000 55.200000 6.3 2.9 1.40 3.24 14.7224 4.9456 1.3330 11.40 12.5286 21.926667 2.3134 0.5740 0.9830 0.2782 0.4996 0.6590 0.8040 70.2040 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 13.272 89.096 5.632 15.648 34.448 6.88 14.14 43.14 8.604 3.02 5.432 0.268 97.716 4.54 3.86 4.80 52.42 65.404 14.462 0.0018 20.000000 14.000000 22.000000 25.266667 9.000000 5.000000 3.000000 6.000000 8.2 1.4682 2.9206 1.7696 3.1664 34.000000 36.200000 14.000000 22.000000 32.000000 34.6 15.0 0.1002 -0.9510 8.10 4.96 16.00 29.0686 45.4290 5.2950 4.2304 1.5114 60.6060 29.0676 4.6838 152.6168 100.0794 8.5592 0.2126 0.1818 0.1714 8.3340 14.7370 0.4718 0.1073 2.50 20.9422 4.19 11.39 2.000000 14.16
1997 1997.0 5.230000 8.515000 1.280000 3.775000 2.560000 1.655000 0.645000 33.050000 56.0 41.88 20.22 18.900000 54.400000 6.8 3.3 1.00 3.24 14.3752 4.7434 1.3460 11.40 13.2700 21.926667 2.3134 0.5770 0.9740 0.2782 0.4996 0.6590 0.8040 69.6330 0.760600 0.0290 0.3532 8.66 3.7274 7.72 1794.56 12.428 89.444 5.508 14.572 35.212 6.72 14.02 43.78 8.076 2.58 5.308 0.252 97.704 4.80 4.00 5.42 53.42 66.036 14.858 0.0018 20.000000 14.000000 20.500000 25.266667 9.000000 6.000000 3.500000 6.500000 9.0 1.3848 2.8902 1.6648 3.0606 32.000000 36.200000 14.000000 21.500000 31.500000 35.6 14.0 0.1002 1.8030 8.10 4.80 15.40 29.0686 45.4290 5.2950 4.2304 1.5114 62.2996 28.9962 5.2362 149.8118 98.6024 6.1676 0.2126 0.1818 0.1714 8.1290 17.4530 0.4718 0.1073 2.50 20.9422 4.19 11.39 2.000000 14.16
1998 1998.0 5.250000 8.550000 1.280000 3.720000 2.570000 1.640000 0.640000 33.000000 56.0 41.88 20.22 19.600000 54.900000 7.7 3.4 1.20 3.24 14.3752 4.7434 1.5080 11.40 13.1210 21.926667 2.3134 0.5770 0.9760 0.2782 0.4996 0.6590 0.8040 69.0660 0.760600 0.0290 0.3532 8.30 3.7274 7.72 1794.56 12.428 89.444 5.508 14.572 35.212 6.72 14.02 43.78 8.076 2.58 5.308 0.252 97.704 4.80 4.00 5.42 53.42 66.036 15.174 0.0018 20.000000 14.000000 19.000000 25.266667 9.000000 7.000000 4.000000 7.000000 9.0 1.3848 2.8902 1.6648 3.0606 30.000000 36.200000 14.000000 21.000000 31.000000 35.6 14.0 0.1002 3.9990 8.20 4.90 15.70 29.0686 45.4290 5.2950 4.2304 1.5114 62.2996 28.9962 5.2362 149.8118 98.6024 6.1676 0.2126 0.1818 0.1714 7.8550 13.8940 0.4718 0.1073 2.50 20.9422 4.19 11.39 2.000000 16.20
1999 1999.0 5.303333 8.520000 1.300000 3.816667 2.593333 1.653333 0.640000 33.266667 56.0 41.88 20.22 19.466667 54.333333 7.0 5.4 1.80 3.24 14.3752 4.9456 1.4850 11.40 12.2840 21.926667 2.3134 0.5860 0.9610 0.2832 0.4706 0.6628 0.7936 68.4970 0.760600 0.0290 0.3532 7.70 3.7274 7.72 1794.56 13.272 89.096 5.632 15.648 34.448 6.72 14.02 43.14 8.604 3.02 5.432 0.268 97.716 4.80 4.00 5.42 53.42 65.404 13.473 0.0018 21.000000 14.333333 19.666667 25.266667 9.000000 7.333333 4.000000 7.000000 8.2 1.4682 2.9206 1.7696 3.1664 30.666667 36.200000 13.666667 22.000000 32.333333 34.6 15.0 0.1160 2.7120 9.40 5.00 15.70 29.0686 45.4290 5.2950 4.2304 1.5114 60.6060 29.0676 4.6838 152.6168 100.0794 8.5592 0.1900 0.1880 0.1726 7.3420 13.0380 0.4718 0.1073 2.50 20.9422 4.19 11.39 2.333333 15.20
2000 2000.0 5.356667 8.490000 1.320000 3.913333 2.616667 1.666667 0.640000 33.533333 55.8 41.80 20.10 19.333333 53.766667 6.1 4.8 1.70 3.82 14.9004 4.3746 1.2220 11.52 11.7390 21.553333 2.3850 0.5660 1.0000 0.2890 0.5068 0.6290 0.7886 67.9190 0.760600 0.1940 0.2226 9.40 4.9294 7.50 1606.30 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 11.891 0.0018 22.000000 14.666667 20.333333 25.266667 9.000000 7.666667 4.000000 7.000000 8.8 1.4230 2.8140 1.7010 2.9264 31.333333 36.200000 13.333333 23.000000 33.666667 37.0 14.0 0.0940 2.2590 10.10 5.00 16.40 30.0950 46.7910 5.5600 5.3690 3.8590 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2050 0.1830 0.1868 6.9800 14.5180 0.4218 0.1079 2.13 20.6600 4.63 12.98 2.666667 14.00
2001 2001.0 5.410000 8.460000 1.340000 4.010000 2.640000 1.680000 0.640000 33.800000 55.8 41.80 20.10 19.200000 53.200000 5.5 3.4 1.30 3.82 14.9004 4.3746 0.9450 11.52 12.2290 21.553333 2.3850 0.5530 1.0270 0.2890 0.5068 0.6290 0.7886 67.6570 0.737000 0.1940 0.2226 8.50 3.2040 7.60 1693.80 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 13.070 0.0018 23.000000 15.000000 21.000000 28.000000 9.000000 8.000000 4.000000 7.000000 8.8 1.4230 2.8140 1.7010 2.9264 32.000000 41.000000 13.000000 24.000000 35.000000 37.0 14.0 0.1000 3.1100 11.00 5.10 16.80 30.4830 45.8670 5.5520 4.9600 1.7940 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2240 0.1860 0.1868 6.8410 13.6560 0.5400 0.1079 2.13 20.6600 4.63 12.98 3.000000 13.10
2002 2002.0 5.453333 8.633333 1.326667 4.063333 2.673333 1.663333 0.626667 33.666667 55.8 41.80 20.10 19.133333 53.666667 5.3 4.0 1.70 3.82 14.9004 4.3746 0.7820 11.52 13.6300 21.553333 2.3850 0.5470 1.0370 0.2890 0.5068 0.6290 0.7886 67.4680 0.748800 0.0360 0.3630 9.40 3.3010 7.90 1834.40 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 11.895 0.0018 22.666667 15.333333 20.000000 26.666667 9.333333 8.333333 4.333333 6.333333 8.8 1.4230 2.8140 1.7010 2.9264 30.000000 38.666667 13.000000 23.666667 34.000000 37.0 14.0 0.0970 2.7490 11.90 5.00 17.30 28.9930 44.5530 5.2360 4.4010 0.8850 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2190 0.1760 0.1868 6.9170 16.7140 0.3850 0.1079 2.13 20.6600 4.63 12.98 2.000000 12.30
2003 2003.0 5.496667 8.806667 1.313333 4.116667 2.706667 1.646667 0.613333 33.533333 55.8 41.80 20.10 19.066667 54.133333 4.8 3.7 1.20 3.82 14.9004 4.3746 0.6500 11.52 13.2220 21.553333 2.3850 0.5550 1.0230 0.3060 0.3740 0.6770 0.7640 67.2780 0.760600 0.0230 0.3420 9.10 3.6210 7.70 1852.80 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 12.960 0.0018 22.333333 15.666667 19.000000 25.333333 9.666667 8.666667 4.666667 5.666667 8.8 1.4230 2.8140 1.7010 2.9264 28.000000 36.333333 13.000000 23.333333 33.000000 37.0 14.0 0.0940 7.4270 12.50 5.20 17.60 28.2390 46.0510 4.9840 3.3710 0.4780 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2250 0.1760 0.1330 6.8020 11.4230 0.6440 0.1079 2.13 20.6600 4.63 12.98 1.000000 12.50
2004 2004.0 5.540000 8.980000 1.300000 4.170000 2.740000 1.630000 0.600000 33.400000 55.0 41.00 20.00 19.000000 54.600000 4.0 2.5 1.35 3.30 14.9004 4.3746 0.4690 10.80 13.1620 21.553333 2.3850 0.5580 1.0160 0.2840 0.4430 0.6630 0.7870 67.0870 0.772400 0.0330 0.3720 10.30 4.0290 7.90 1985.50 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 12.082 0.0018 22.000000 16.000000 18.000000 24.000000 10.000000 9.000000 5.000000 5.000000 8.8 1.4230 2.8140 1.7010 2.9264 26.000000 34.000000 13.000000 23.000000 32.000000 37.0 14.0 0.0920 3.2520 13.10 5.20 18.10 27.5330 43.8830 5.1430 3.0510 0.5410 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2440 0.2030 0.1720 6.1880 11.4990 0.4020 0.1079 2.13 20.6600 4.63 12.98 0.000000 12.70
2005 2005.0 5.450000 8.663333 1.266667 4.116667 2.680000 1.623333 0.610000 32.966667 56.0 42.00 20.50 19.400000 56.100000 3.8 2.1 1.50 2.80 15.6520 4.3746 0.3760 11.10 12.4390 21.553333 2.3850 0.5610 1.0150 0.2700 0.5010 0.6540 0.8040 66.8980 0.784200 0.0270 0.3480 11.60 4.4820 8.30 2123.60 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 12.452 0.0010 21.000000 15.333333 16.666667 22.333333 9.666667 10.000000 6.000000 6.000000 8.8 1.4230 2.8140 1.7010 2.9264 23.666667 31.000000 12.000000 21.666667 29.666667 37.0 14.0 0.0980 2.4320 13.50 5.10 17.70 27.4710 45.6210 5.1340 3.0750 0.6010 60.2844 31.4236 4.4398 158.9028 103.6370 10.5486 0.2200 0.2200 0.1840 6.2640 14.7550 0.3880 0.1079 2.13 20.6600 4.63 12.98 0.000000 14.00
2006 2006.0 5.360000 8.346667 1.233333 4.063333 2.620000 1.616667 0.620000 32.533333 57.0 43.00 21.00 19.800000 57.600000 3.9 1.9 1.70 2.80 14.9320 4.3746 0.2950 10.70 14.6120 17.900000 2.3850 0.5750 0.9890 0.2640 0.5480 0.6520 0.8190 66.6140 0.796000 0.0260 0.3410 12.30 4.6960 8.60 2396.50 14.056 88.528 7.156 17.144 33.544 6.86 14.32 40.92 8.792 3.74 5.516 0.364 97.808 4.36 3.56 4.14 51.74 64.932 12.520 0.0010 20.000000 14.666667 15.333333 20.666667 9.333333 11.000000 7.000000 7.000000 8.8 1.4230 2.8140 1.7010 2.9264 21.333333 28.000000 11.000000 20.333333 27.333333 37.0 14.0 0.0990 1.9220 13.30 6.10 20.10 27.9880 43.3940 5.1610 3.3310 0.5750 64.2860 33.9290 7.1430 152.7330 92.9260 9.0030 0.2660 0.2210 0.1800 6.3580 11.7100 0.4690 0.1079 2.13 20.6600 4.63 12.98 0.000000 12.10
2007 2007.0 5.270000 8.030000 1.200000 4.010000 2.560000 1.610000 0.630000 32.100000 56.0 42.00 20.00 20.200000 59.100000 3.6 1.2 1.40 3.10 14.7430 3.2500 0.2130 10.90 13.4410 19.933333 2.3850 0.5760 0.9870 0.2670 0.6320 0.6490 0.8460 66.3120 0.812857 0.0250 0.3280 13.10 5.0000 8.30 2440.00 10.400 89.600 6.600 12.700 36.200 6.86 14.32 43.00 7.400 1.80 5.100 0.200 97.600 4.36 3.56 4.14 51.74 67.700 11.855 0.0010 19.000000 14.000000 14.000000 19.000000 9.000000 12.000000 8.000000 8.000000 12.0 1.5790 2.8420 1.7860 2.8570 19.000000 25.000000 10.000000 19.000000 25.000000 34.0 14.0 0.1140 2.8700 13.10 5.40 17.20 28.6240 43.0980 5.2040 3.4880 0.6010 63.4210 37.6910 7.4410 159.6150 105.1280 7.6920 0.2750 0.2260 0.1880 6.1830 12.0750 0.5770 0.1079 2.13 20.6600 4.63 12.98 0.000000 11.90
2008 2008.0 5.320000 8.350000 1.290000 3.990000 2.560000 1.610000 0.630000 33.000000 55.0 41.00 19.00 19.800000 59.000000 4.0 2.1 1.10 3.20 14.1540 3.3890 0.1740 11.10 12.5340 21.966667 2.3850 0.5670 1.0090 0.2670 0.6210 0.6410 0.8360 66.0130 0.829714 0.0340 0.3000 13.80 4.7080 9.10 2718.20 11.180 89.540 6.180 13.420 35.820 6.86 14.32 43.30 7.660 2.10 5.180 0.220 97.640 4.36 3.56 4.14 51.74 67.060 12.670 0.0010 20.000000 13.000000 12.000000 19.000000 8.000000 12.000000 6.000000 6.000000 11.0 1.2500 2.8000 1.5380 2.8460 17.000000 26.000000 10.000000 17.000000 27.000000 42.0 11.0 0.1200 8.9290 12.40 5.10 16.80 28.8460 43.3890 5.2760 3.6520 0.4670 67.3440 32.0660 4.7870 151.1900 92.2620 3.5710 0.2700 0.1910 0.1760 6.1140 11.9720 0.4400 0.1079 2.13 20.6600 4.63 12.98 2.000000 12.50
2009 2009.0 5.240000 8.190000 1.270000 3.910000 2.530000 1.580000 0.630000 32.700000 54.5 40.00 18.50 20.000000 57.600000 5.8 2.9 1.40 4.30 13.4360 5.2330 0.3740 14.10 12.8730 24.000000 2.3850 0.5760 0.9960 0.2780 0.5540 0.6500 0.8100 65.7090 0.846571 0.0410 0.2950 10.80 4.6520 9.70 2954.50 11.960 89.480 5.760 14.140 35.440 6.86 14.32 43.60 7.920 2.40 5.260 0.240 97.680 4.36 3.56 4.14 51.74 66.420 12.335 0.0020 21.000000 14.000000 13.000000 18.000000 9.000000 9.000000 5.000000 5.000000 7.0 1.4290 2.5710 1.7140 2.9290 17.000000 25.000000 11.000000 20.000000 30.000000 42.0 14.0 0.1750 24.0760 12.40 6.00 17.90 27.6060 41.8460 5.2770 3.7260 0.5640 62.5000 25.8750 6.2500 135.8660 103.9510 11.2460 0.2890 0.2020 0.1890 6.3130 15.8050 0.5430 0.0840 1.10 20.1000 5.70 14.20 1.000000 11.50
2010 2010.0 5.290000 8.250000 1.240000 4.000000 2.550000 1.570000 0.620000 32.300000 54.0 39.00 18.00 20.100000 60.100000 6.1 3.0 1.50 3.90 13.7690 5.6010 0.5460 13.40 12.7220 23.966667 2.3850 0.5560 1.0310 0.2870 0.5730 0.6690 0.8240 65.4050 0.863429 0.0520 0.2990 10.10 4.7770 9.70 3006.40 12.740 89.420 5.340 14.860 35.060 6.86 14.32 43.90 8.180 2.70 5.340 0.260 97.720 4.36 3.56 4.14 51.74 65.780 12.428 0.0030 21.000000 15.000000 13.000000 18.000000 10.000000 11.000000 5.000000 4.000000 6.0 1.3330 2.8570 1.6670 3.0000 19.000000 26.000000 13.000000 23.000000 32.000000 33.0 18.0 0.1970 10.7340 12.10 6.00 18.30 28.8460 42.2170 5.4620 3.9200 0.6170 63.7890 28.0750 5.5900 146.9510 104.2680 4.5730 0.2910 0.2070 0.2000 5.9860 9.6540 0.5480 0.0890 1.85 19.6000 4.25 11.75 1.000000 10.80
2011 2011.0 5.910000 9.860000 1.440000 4.410000 2.670000 1.620000 0.610000 35.000000 55.0 40.20 18.80 18.900000 57.500000 6.0 2.9 1.70 4.00 13.2850 5.5920 0.5300 13.20 13.6930 23.933333 2.3850 0.5570 1.0310 0.2910 0.5290 0.6780 0.8130 65.1030 0.880286 0.0570 0.2540 9.00 4.7030 9.60 3106.50 13.520 89.360 4.920 15.580 34.680 6.86 14.32 44.20 8.440 3.00 5.420 0.280 97.760 4.36 3.56 4.14 51.74 65.140 12.721 0.0030 21.000000 15.000000 14.000000 21.000000 10.000000 9.000000 5.000000 5.000000 8.0 1.5240 3.0000 1.8000 3.0000 18.000000 28.000000 11.000000 19.000000 29.000000 34.0 13.0 0.1700 7.1180 11.00 5.70 16.60 31.9480 48.2190 5.6250 4.1160 0.6340 60.9060 25.1490 4.2910 146.4070 98.2040 9.2810 0.2840 0.1990 0.2140 5.8390 8.4400 0.4530 0.0940 2.60 19.1000 2.80 9.30 1.000000 10.30
2012 2012.0 5.160000 7.990000 1.220000 3.930000 2.640000 1.680000 0.630000 32.200000 56.0 41.40 19.60 20.100000 58.700000 6.4 3.9 1.60 4.20 13.2640 5.8850 0.8510 13.50 13.5550 23.900000 2.3850 0.5540 1.0370 0.2920 0.4870 0.6680 0.7940 64.8060 0.897143 0.0590 0.1710 8.70 4.8050 9.70 3156.80 14.300 89.300 4.500 16.300 34.300 6.70 13.60 44.50 8.700 3.30 5.500 0.300 97.800 6.60 4.50 4.70 55.50 64.500 12.402 0.0030 21.000000 15.000000 14.000000 19.000000 9.000000 9.000000 6.000000 5.000000 8.0 1.2380 2.9520 1.5330 3.6000 20.000000 27.000000 12.000000 23.000000 31.000000 35.0 14.0 0.1400 6.7520 10.80 5.40 16.90 32.0550 49.7840 5.7400 3.9960 0.5090 59.0960 20.1370 1.7160 143.3530 92.4860 0.2890 0.2720 0.1830 0.1940 5.2470 10.2090 0.4730 0.0990 2.20 21.2805 3.75 10.50 1.000000 9.10
2013 2013.0 5.430000 8.460000 1.320000 4.150000 2.620000 1.660000 0.630000 33.600000 57.0 42.60 20.40 19.700000 58.300000 5.8 3.6 1.60 4.10 14.0850 5.3330 0.7000 11.90 14.0800 24.233333 1.7930 0.5630 1.0210 0.2940 0.5130 0.6630 0.7950 64.4120 0.914000 0.1170 0.1820 8.50 5.0380 9.40 3346.40 14.500 89.500 6.100 16.400 33.700 7.20 14.50 43.30 8.700 4.40 5.800 0.500 98.200 4.50 4.30 6.50 54.90 63.400 12.166 0.0030 22.000000 15.000000 12.000000 18.000000 10.000000 10.000000 4.000000 2.000000 7.0 1.3180 2.7730 1.6670 3.1330 16.000000 24.000000 13.000000 21.000000 30.000000 39.0 12.0 0.1180 3.5260 10.60 5.70 17.90 32.6360 51.1010 5.8350 3.9350 0.4900 59.8330 29.1670 3.5000 162.3930 94.0170 9.9720 0.2650 0.1820 0.1950 5.2310 10.8060 0.4780 0.1040 1.80 23.4610 4.70 11.70 1.000000 11.20
2014 2014.0 5.800000 9.470000 1.370000 4.095000 2.770000 1.680000 0.610000 34.300000 58.0 43.80 21.20 19.000000 56.200000 5.4 2.9 1.60 4.10 14.0320 4.7640 0.7330 11.40 13.8650 24.566667 2.0470 0.5430 1.0600 0.2960 0.5410 0.6500 0.8010 64.0000 0.880286 0.1920 0.1780 9.30 5.3010 9.40 3453.30 15.500 91.400 6.200 18.600 33.400 7.00 14.00 41.50 8.400 4.00 5.400 0.700 98.400 3.90 3.90 5.30 53.40 62.600 11.721 0.0030 21.000000 16.000000 13.000000 17.000000 10.000000 10.000000 6.000000 4.000000 7.0 1.6670 2.9520 2.0620 3.3750 18.000000 25.000000 13.000000 23.000000 31.000000 37.0 16.0 0.1000 2.7840 10.20 5.20 17.30 33.5130 53.7220 5.9590 3.7030 0.4750 66.6670 31.1110 2.7780 161.4330 110.7440 5.7850 0.2390 0.1640 0.1910 5.1790 9.5360 0.5950 0.1190 2.85 22.6305 4.95 11.65 1.000000 9.90
2015 2015.0 5.840000 9.690000 1.440000 4.040000 2.620000 1.610000 0.610000 35.000000 59.0 45.00 22.00 19.100000 57.600000 5.4 3.1 1.50 3.90 13.6140 4.6830 0.7070 11.30 13.8910 24.900000 2.4520 0.5570 1.0320 0.2890 0.5160 0.6320 0.7860 63.5850 0.880286 0.1860 0.1680 8.80 5.5050 9.30 3530.10 14.600 88.900 6.200 17.400 33.100 5.80 13.70 40.50 8.600 4.00 6.100 1.100 98.000 5.20 3.30 6.10 54.80 63.100 12.263 0.0030 21.000000 15.000000 11.000000 16.000000 10.000000 11.000000 6.000000 4.000000 6.0 1.7620 2.6670 2.1330 2.8000 14.000000 22.000000 13.000000 21.000000 31.000000 44.0 12.0 0.1000 2.8610 9.80 5.30 17.80 33.0750 53.3980 6.0580 3.6930 0.4590 63.1350 29.7300 2.7030 157.1050 103.7530 7.2390 0.2480 0.1660 0.1880 5.5380 10.4430 0.4640 0.1340 3.90 21.8000 5.20 11.60 1.000000 11.80
2016 2016.0 5.520000 8.910000 1.330000 4.160000 2.590000 1.640000 0.630000 33.600000 57.0 42.88 20.82 19.600000 57.300000 5.1 3.0 1.20 4.30 15.0210 4.4000 0.7210 12.00 11.1880 23.513333 2.2340 0.5560 1.0330 0.2890 0.5430 0.6140 0.7890 63.1780 0.822297 0.1910 0.1190 8.00 5.7970 8.82 2861.24 15.400 87.800 6.800 18.800 32.000 6.90 14.30 40.10 10.300 4.30 5.800 0.300 97.700 3.80 4.00 4.50 48.50 63.900 12.328 0.0030 21.466667 15.400000 14.333333 19.733333 9.866667 9.933333 5.333333 4.333333 8.0 1.5700 2.8468 1.8896 3.0330 19.933333 27.666667 12.800000 22.000000 30.933333 37.6 13.4 0.1080 3.7760 9.40 5.20 17.80 34.4090 56.2210 6.2000 3.6000 0.4000 55.3210 28.4320 2.8280 160.9760 111.6530 16.5310 0.2820 0.1790 0.2010 5.6740 10.6540 0.3410 0.1355 3.00 22.1000 5.20 13.75 1.000000 12.00
2017 2017.0 5.700000 9.460000 1.380000 4.060000 2.620000 1.600000 0.610000 34.300000 57.0 42.88 20.82 19.213333 55.666667 4.7 3.1 1.20 4.40 14.4808 4.4860 0.7350 11.80 12.0920 23.513333 3.0410 0.5546 1.0294 0.2840 0.5840 0.6010 0.8020 62.7650 0.760600 0.2820 0.1330 7.90 6.1050 8.52 2589.06 16.700 88.200 7.600 19.900 31.900 6.60 14.30 38.80 9.300 4.90 5.600 0.500 98.100 3.90 2.60 3.00 49.80 62.500 12.636 0.0020 21.466667 15.600000 15.533333 20.933333 9.866667 9.733333 5.533333 4.933333 8.0 1.5700 2.8468 1.8896 3.0330 21.933333 29.666667 12.800000 22.400000 31.333333 37.6 13.4 0.0960 1.0480 9.00 5.18 17.70 32.1670 49.0996 6.2000 3.7000 0.4000 51.0500 25.0000 0.0000 170.0000 116.2160 15.9460 0.3080 0.1920 0.2120 5.5070 7.3010 0.5170 0.1370 2.10 22.4000 5.20 15.90 1.000000 9.40
2018 2018.0 5.658000 9.198000 1.368000 4.101000 2.644000 1.638000 0.618000 34.160000 57.0 42.60 20.40 19.500000 57.620000 4.3 2.6 1.60 4.30 14.0032 5.0130 0.7192 11.90 13.0232 24.306667 2.8100 0.5546 1.0366 0.2820 0.5900 0.5900 0.8010 62.3550 0.870171 0.4380 0.1560 7.80 6.2400 9.48 3318.62 16.600 87.500 8.600 20.900 31.800 6.90 14.90 39.40 9.300 5.60 5.900 0.600 98.000 3.00 2.40 2.00 50.00 63.500 13.667 0.0020 21.200000 15.200000 12.800000 18.200000 9.800000 9.800000 5.400000 4.000000 7.2 1.5018 2.8688 1.8390 3.1816 17.200000 25.200000 12.400000 21.400000 30.400000 37.8 13.4 0.0880 2.1850 9.80 5.36 17.54 33.1600 52.8452 6.0504 3.7262 0.4448 59.2012 28.6880 2.3618 162.3814 107.2766 11.0946 0.3150 0.1790 0.2160 3.9060 9.7480 0.3260 0.1259 2.73 22.4783 5.05 12.92 1.000000 9.20
In [42]:
#This should only be executed if you want to save your work, otherwise the csv is already good just use it
#new_df.to_csv(r'workingdf.csv')
In [3]:
new_df=pd.read_csv(r'workingdf.csv')

I want to try and fill in earlier values for top 10% wealth share and top 1%; I want to check what other wealth and income indicators they correlate with and see if I can get what looks like a reasonable match.

In [9]:
# I want to display in the notebook dynamic charts generated from series here

import matplotlib.pyplot as plt

# create a sample DataFrame
#data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
#df = pd.DataFrame(data)

# plot the data
plt.plot(new_df[['Top 10 percent wealth share',
                 'Top 1 percent wealth share',
                 'Q5 to Q1 income share ratio',
                 'D10 to D1 income share ratio',
                 'D10 to D1-4 income share ratio (Palma)',
                 'P90 to P10 income ratio',
                 'P80 to P20 income ratio',
                 'P80 to P50 income ratio',
                 'P50 to P20 income ratio',
                 'GINI']])

# add x and y axis labels
plt.xlabel('Year')
plt.ylabel('Top 1% Wealth Share')

# display the plot
plt.show()

This is my intuitive sketch:

image.png

In [ ]:
	Q5 to Q1 income share ratio	D10 to D1 income share ratio	D10 to D1-4 income share ratio (Palma)	P90 to P10 income ratio	P80 to P20 income ratio	P80 to P50 income ratio	P50 to P20 income ratio	GINI
In [13]:
# calculate the correlation coefficients between all columns
correlations = new_df.corr()
In [11]:
new_df=new_df.drop(columns='Unnamed: 0')
In [1]:
#correlations

Ok now we're going to try and fill in the gaps.

In [4]:
import pandas as pd
from sklearn.impute import KNNImputer

# create an instance of the KNNImputer class
imputer = KNNImputer(n_neighbors=5)

# fill in missing values using the KNN imputation strategy
new_df_imputed = pd.DataFrame(imputer.fit_transform(new_df), columns=new_df.columns)
In [5]:
new_df_imputed.index=range(1982,2019)
In [29]:
# now we're gonna do it again with the imputer and rate my guesses
# I want to display in the notebook dynamic charts generated from series here

import matplotlib.pyplot as plt

# create a sample DataFrame
#data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
#df = pd.DataFrame(data)

# plot the data
plt.plot(new_df_imputed[['Top 10 percent wealth share',
                 'Top 1 percent wealth share',
                 'Q5 to Q1 income share ratio',
                 'D10 to D1 income share ratio',
                 'D10 to D1-4 income share ratio (Palma)',
                 'P90 to P10 income ratio',
                 'P80 to P20 income ratio',
                 'P80 to P50 income ratio',
                 'P50 to P20 income ratio',
                 'GINI']])

# add x and y axis labels
plt.xlabel('Year')
plt.ylabel('Top 1% Wealth Share')

# display the plot
plt.show()

Okay well it thinks they were pretty much flat whereas I assumed they used to be flatter.

This seems to me like the machine being much too conservative. It's kind of a status quo bias I think.

But obviously I could be wrong and the opposite could be true

In [32]:
#will try use linear regression machine learning tools

import pandas as pd
from sklearn.linear_model import LinearRegression

# Split your data into input features and target variables
X = new_df_imputed[['Top 10 percent wealth share',
                 'Top 1 percent wealth share',
                 'Q5 to Q1 income share ratio',
                 'D10 to D1 income share ratio',
                 'D10 to D1-4 income share ratio (Palma)',
                 'P90 to P10 income ratio',
                 'P80 to P20 income ratio',
                 'P80 to P50 income ratio',
                 'P50 to P20 income ratio',
                 'GINI']]
y = new_df_imputed['Rate of suicides']

# Create a linear regression model
model = LinearRegression()

# Fit the model to your data
model.fit(X, y)

# Use the model to make predictions
predictions = model.predict(X)
In [33]:
predictions;
Out[33]:
array([13.20625065, 13.07744446, 12.94863827, 12.84846187, 12.74828546,
       13.19076001, 13.63323455, 13.60803243, 13.58283032, 13.82125593,
       14.15935544, 13.81984218, 13.45527804, 14.1060358 , 14.75679356,
       14.53694868, 14.34215469, 13.59841697, 12.868969  , 12.15028217,
       12.29133115, 12.43238012, 12.44567616, 12.24263164, 12.03958712,
       11.89020222, 12.57234912, 12.90319532, 12.17592054, 12.17296657,
       12.43306936, 12.2542247 , 11.62416981, 12.79490669, 12.41342634,
       13.27401805, 12.99527463])

Ok so might try and check some other data sources; stats nz api, data.govt.nz access; had a brief look but will come back to it at this point. Might see what else is programatically available machine readable etc later, also possibly just find ways of accessing other stuff programatically as well possibly scraping

In [35]:
import requests

url = "https://api.stats.govt.nz/odata/v1/data.json"
headers = {"Ocp-Apim-Subscription-Key": "dc4ec3513435440ea403e1dcfbc65abd"}

response = requests.get(url, headers=headers)
catalogue = response.json()
In [2]:
#print(catalogue);
In [ ]:
#not sure about data.govt.nz; want to get this done in the next four hours to limit penalty days to three
#next thing i guess need to come back to the eda and corellations etc; want a heatmap
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Compute the correlation matrix
corr_matrix = new_df_imputed.corr()

# Plot the correlation matrix as a heatmap
fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', ax=ax)

# Customize the plot
ax.set_title('Correlation Matrix')
ax.set_xticklabels(corr_matrix.columns, rotation=90)
ax.set_yticklabels(corr_matrix.columns, rotation=0)

It's too big for a heatmap. So what else? Explore correlations? I want to break it into its dimensions and use those.

Wil try factor analysis at various levels next

In [5]:
# Import required libraries
from factor_analyzer import FactorAnalyzer

# Create factor analyzer object
fa = FactorAnalyzer()

# Fit factor analysis on your data
#uncomment next lines once iputed df is done

#fa.fit(new_df_imputed)

# Get factor loadings
#loadings = fa.loadings_   #uncommment this line too

# Print factor loadings
#print(loadings)

I'll turn it into a dataframe

In [4]:
# Create a DataFrame with the loadings
#loadings_df = pd.DataFrame(loadings, index=new_df.columns) #uncomment once imputed df done etc

# Rename the columns to indicate which factor they belong to
#loadings_df.columns = [f"Factor {i+1}" for i in range(3)]  #also uncomment this
In [61]:
#loadings_df; should really reproduce this and sort by to make it actually meaningful to look at but time is running low
Out[61]:
Factor 1 Factor 2 Factor 3
Q5 to Q1 income share ratio 0.350229 0.263251 0.744821
D10 to D1 income share ratio 0.361847 0.326036 0.704491
D10 to D1-4 income share ratio (Palma) 0.375079 0.298428 0.710708
P90 to P10 income ratio 0.350867 0.145675 0.752004
P80 to P20 income ratio 0.444880 0.170358 0.747626
P80 to P50 income ratio 0.678365 0.220154 0.494380
P50 to P20 income ratio -0.092224 -0.130321 -0.772812
GINI 0.428153 0.267289 0.705782
Top 10 percent wealth share -0.091852 0.793823 0.081337
Top 5 percent wealth share -0.004880 0.719987 0.001666
Top 1 percent wealth share 0.182690 0.692383 -0.021016
Lower deciles income share -0.551920 -0.192125 -0.678646
Middle class income share -0.795295 -0.160937 -0.413007
Unemployment rate 0.384017 0.061980 -0.124140
Unemployment rate, 60-64 0.408594 0.313073 0.346450
Unemployment rate, 65+ -0.283550 -0.122786 -0.466559
Underemployment rate -0.409163 0.587865 -0.000205
Percentage of employees working long hours 0.525627 -0.140663 -0.477840
Labour market insecurity 0.068568 0.319287 0.370861
Long-term unemployment rate 0.538077 0.126601 -0.254201
Percentage of youth NEET -0.328980 -0.023806 0.392515
Percentage of workforce on low pay, OECD definition -0.146973 0.000727 0.593454
Percentage of workforce on low pay, relative to minimum wage -0.259395 0.605172 0.466440
Minimum to living wage gap -0.022415 0.127476 -0.417351
Labour share of income 0.287928 -0.410215 -0.406903
Labour productivity to real product wages ratio -0.528520 0.391410 0.420790
Percentage of households spending above 30% of income on rental 0.200092 0.543202 0.380771
Percentage of households spending above 30% of income on housing -0.685012 -0.154001 -0.337507
House affordability, rent 0.266817 -0.374432 0.761553
House affordability, purchase -0.548321 -0.509726 -0.108820
Percentage of home ownership 0.847324 -0.291290 -0.095356
Percentage of homelessness in population -0.890915 0.086594 0.404026
Percentage of Priority A state housing applicants in population -0.421164 0.665464 -0.507921
Percentage of Priority B state housing applicants in population 0.621428 -0.711094 0.199075
Percentage of disposable income spent on household debt -0.255302 -0.750508 0.034042
Median multiple for housing -0.823454 0.368296 -0.337473
Health expenditure as a percentage of GDP -0.930430 -0.024420 0.287866
Health expenditure per capita, PPP -0.957434 0.177926 0.329170
Prevalence of depression, adult 0.243929 0.816961 -0.253500
Prevalence of self-rated health as good or better, adult -0.140904 -0.122854 0.630237
Prevalence of pyschological distress, adult 0.042771 0.171977 -0.688606
Prevalence of mood_anxiety disorders, adult 0.241902 0.777756 -0.381863
Prevalence of healthy weight, adult -0.230885 -0.785376 0.427694
Prevalence of unmet need for after-hours care due to cost, adult 0.190909 -0.308469 -0.079903
Prevalence of unmet need for GP due to cost, adult 0.032788 -0.108336 -0.348008
Prevalence of adequate vegetable and fruit intake, adult -0.141069 -0.502686 0.647818
Prevalence of breakfasting at home less than 5 days, child 0.299486 0.629255 -0.412321
Prevalence of emotional_behavioural problems, child 0.170516 0.798628 -0.352586
Prevalence of diabetes, adult -0.009299 0.817945 -0.129552
Prevalence of depression, child -0.146000 0.783074 0.115811
Prevalence of good or better parent-rated health, child 0.024716 0.743543 0.117121
Prevalence of unfulfilled prescriptions due to cost, child -0.112003 -0.194199 0.371407
Prevalence of unmet need to after hours care due to cost, child -0.020207 -0.236186 0.366494
Prevalence of unmet need for GP due to cost, child -0.229059 0.058914 0.408177
Prevalence of adequate vegetable and fruit intake, child -0.250325 -0.011436 0.486584
Prevalence of healthy weight, child -0.157078 -0.889561 0.056051
Rate of suicides 0.328255 0.038988 -0.359086
Prevalence of problem gambling interventions -0.364745 0.620731 0.493571
Prevalence of poverty 60% ML 0.555560 0.133291 0.705871
Prevalence of poverty 50% ML 0.456947 0.153769 0.765167
Prevalence of poverty 50% AL 0.952935 0.034855 -0.047236
Prevalence of poverty 60% AL 0.934984 -0.011427 -0.197305
Prevalence of poverty 40% AL 0.415516 0.145715 0.782439
Prevalence of poverty, 60% ML, elderly -0.264164 -0.051844 0.657728
Prevalence of poverty 50% ML, elderly -0.128698 -0.080848 0.691796
Prevalence of poverty 50% AL, elderly 0.746581 -0.252005 0.211777
Prevalence of poverty 60% AL, elderly 0.144439 -0.619743 -0.376211
Poverty risk ratio, 60% AL, single under 65 -0.074397 0.573831 0.031137
Poverty risk ratio, 60% AL, solo parent 0.347192 0.185125 0.036720
Poverty risk ratio, 50% AL, single under 65 -0.125510 0.722964 0.122863
Poverty risk ratio, 50% AL, solo parent 0.088397 0.386369 0.150951
Prevalence of poverty, 50% AL, child 0.942782 0.027393 -0.147385
Prevalence of poverty, 60% AL, child 0.930817 0.045034 -0.139779
Prevalence of poverty, 40% ML, child 0.598863 0.249849 0.609329
Prevalence of poverty, 50% ML, child 0.597899 0.185112 0.655111
Prevalence of poverty, 60% ML, child 0.692520 0.222175 0.539524
Prevalence of poverty, 60% AL, children with part time working parent_s -0.492050 0.133673 -0.025018
Prevalence of poverty, 60% AL, children with full time working parent_s 0.418129 -0.015209 0.247194
Prevalence of personal insolvencies -0.336167 -0.341152 0.504042
Loan delinquencies -0.348573 -0.431667 0.326925
Tertiary education participation -0.404402 -0.547589 0.201886
Education expenditure, GDP -0.731266 -0.260656 0.137075
Education expenditure, government expenses -0.667659 -0.077599 -0.044736
Tertiary loan as a percentage of income -0.386253 0.847696 0.023393
Tertiary loan leaving balance as a percentage of income -0.200771 0.877368 0.131334
University fees to income ratio -0.470856 0.826496 -0.141708
Polytechnic fees to income ratio 0.535876 0.117882 0.069422
Wānanga fees to income ratio 0.701753 -0.029791 -0.053507
Degree earnings premium, hourly -0.236607 -0.381692 0.452254
Diploma_certificate earnings premium, hourly -0.018438 -0.329618 -0.142432
School earnings premium, hourly -0.073784 -0.787076 0.096951
Degree earnings premium, weekly 0.168019 0.501267 -0.362791
Diploma_certificate earnings premium, weekly -0.065215 0.404437 -0.469647
School earnings premium, weekly 0.100013 0.220807 -0.608263
Percentage of population in remand -0.855810 0.017514 -0.202576
Percentage of population sentenced -0.089379 -0.707511 -0.071888
Percentage of population post-sentence -0.643938 0.312288 -0.171581
Incidence of crime victimisation 0.917069 -0.222778 -0.023838
Rate of murder and homicide 0.689974 -0.333715 -0.139775
Regional GDP variation 0.055365 -0.179654 0.383025
Regional income inadequacy variation 0.046555 0.742580 -0.399890
Inadequacy Of Income, Gender 0.002504 0.672608 0.035075
Inadequacy Of Income, Housing Tenure -0.195895 0.728400 -0.237233
Inadequacy Of Income, Long-Term Migrant -0.223058 0.220265 -0.239261
Inadequacy Of Income, Māori 0.041560 -0.042832 -0.502046
Low Income, Gender 0.604142 0.174471 0.209359
Gender pay gap 0.871900 -0.228892 -0.091007
In [62]:
#scree plot
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer

# Create factor analyzer object with 3 factors
fa = FactorAnalyzer(n_factors=3)

# Fit factor analysis to data
fa.fit(new_df_imputed)

# Create scree plot
plt.plot(range(1,new_df_imputed.shape[1]+1), fa.get_eigenvalues()[0])
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

Lends itself strongly to a single or very low factor analysis. It seems like there are one or two factors; we could say it's inequality, statistically just a high concentraiton of wealth and an increase over time etc

In [80]:
#will try a pairs plot broken up into chunks
import os
import matplotlib.pyplot as plt
#import seaborn as sns
# create folder if it doesn't exist
#folder = "pairsplots"
#if not os.path.exists(folder):
#    os.makedirs(folder)

# create a list of column names to include in the pairs plot
#cols = list(new_df_imputed.columns)

# divide the list of columns into chunks of 10
#n_cols = 10
#chunks = [cols[i:i + n_cols] for i in range(0, len(cols), n_cols)]

# create subplots for each chunk of columns
#for i in range(len(chunks)-1):

#    sns.pairplot(new_df_imputed[chunks[i]])
    
    
    # save the plot to a file
 #   filename = os.path.join(folder,"chunk_{}.png".format(i))


#    plt.savefig(filename)
In [69]:
from IPython.core.display import display, HTML

Principal Component Analysis:

In [81]:
from sklearn.decomposition import PCA

# fit PCA to the data
pca = PCA()
pca.fit(new_df_imputed)

# get the principal components
pcs = pca.transform(new_df_imputed)

# plot the explained variance ratio of each component
plt.plot(pca.explained_variance_ratio_)
plt.xlabel('Component')
plt.ylabel('Explained Variance Ratio')
plt.show()
In [82]:
# will use first two factors for a transform:

# fit PCA to the data
pca = PCA(n_components=2)
pcs = pca.fit_transform(new_df_imputed)

# plot the first two principal components
plt.scatter(pcs[:,0], pcs[:,1])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()
In [6]:
#commented out for avoiding dump

#import pandas as pd
#from sklearn.decomposition import PCA

# fit PCA to the data
#pca = PCA(n_components=2)
#pca.fit(new_df_imputed)

# extract the component weights
#weights = pca.components_
                                
# print the weights and corresponding variable names

#for i, comp in enumerate(weights):
#    print(f"Principal Component {i+1} weights:")
#    for j, var in enumerate(new_df_imputed.columns):
#        print(f"{var}: {comp[j]}")
In [88]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# fit PCA to the data
pca = PCA(n_components=2)
pca.fit(new_df_imputed)

# extract the component weights
weights = pca.components_

# create a dataframe of the weights with variable names as row labels
weights_df = pd.DataFrame(weights, columns=new_df_imputed.columns)
weights_df.index = ['Component 1', 'Component 2']

# create a bar chart of the weights
fig, ax = plt.subplots(figsize=(10, 5))
weights_df.plot(kind='bar', ax=ax)
ax.set_title('Principal Component Weights')
ax.set_ylabel('Weight')
ax.set_xlabel('Component')
plt.show()
In [89]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# fit PCA to the data
pca = PCA(n_components=2)
pca.fit(new_df_imputed)

# extract the component weights for the second component
weights = pca.components_[1]

# create a dataframe of the weights with variable names as row labels
weights_df = pd.DataFrame(weights, index=new_df_imputed.columns, columns=['Weight'])

# sort the weights by absolute value
weights_df['AbsWeight'] = weights_df['Weight'].abs()
weights_df.sort_values('AbsWeight', ascending=False, inplace=True)

# create a bar chart of the weights
fig, ax = plt.subplots(figsize=(10, 5))
weights_df['Weight'].plot(kind='bar', ax=ax)
ax.set_title('Second Principal Component Weights')
ax.set_ylabel('Weight')
ax.set_xlabel('Variable')
plt.show()
In [90]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# fit PCA to the data
pca = PCA(n_components=2)
pca.fit(new_df_imputed)

# extract the component weights for the second component
weights = pca.components_[1]

# create a dataframe of the weights with variable names as row labels
weights_df = pd.DataFrame(weights, index=new_df_imputed.columns, columns=['Weight'])

# sort the weights by absolute value
weights_df['AbsWeight'] = weights_df['Weight'].abs()
weights_df.sort_values('AbsWeight', ascending=False, inplace=True)

# create separate bar charts for positive and negative weights
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 5))
weights_df[weights_df['Weight'] >= 0]['Weight'].plot(kind='bar', ax=ax1, color='green')
ax1.set_title('Positive Weights')
ax1.set_ylabel('Weight')
ax1.set_xlabel('Variable')

weights_df[weights_df['Weight'] < 0]['Weight'].plot(kind='bar', ax=ax2, color='red')
ax2.set_title('Negative Weights')
ax2.set_ylabel('Weight')
ax2.set_xlabel('Variable')

plt.show()
In [91]:
# extract the component weights for the second component
weights = pca.components_[1]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by absolute value of the weight
weight_pairs.sort(key=lambda x: abs(x[1]), reverse=True)

# extract the first 5 positive and negative weights
positive_weights = [pair for pair in weight_pairs if pair[1] > 0][:5]
negative_weights = [pair for pair in weight_pairs if pair[1] < 0][:5]

print("First 5 positive weights:")
for pair in positive_weights:
    print(pair[0], pair[1])

print("\nFirst 5 negative weights:")
for pair in negative_weights:
    print(pair[0], pair[1])
First 5 positive weights:
Prevalence of poverty, 60% ML, child 0.4248203862823615
Prevalence of poverty, 50% ML, child 0.37560118529435643
Prevalence of poverty 60% ML 0.2998481999315203
Prevalence of poverty 50% ML 0.2724697600228968
Prevalence of poverty, 40% ML, child 0.23450214487466195

First 5 negative weights:
Middle class income share -0.2580506435642271
Diploma_certificate earnings premium, weekly -0.20536839493145578
School earnings premium, weekly -0.14903337865404936
Lower deciles income share -0.1134454701402376
Degree earnings premium, weekly -0.11184118960807596
In [92]:
import matplotlib.pyplot as plt

# extract the component weights for the second component
weights = pca.components_[1]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by absolute value of the weight
weight_pairs.sort(key=lambda x: abs(x[1]), reverse=True)

# extract the first half of positive weights
positive_weights = [pair for pair in weight_pairs if pair[1] > 0][:len(weight_pairs)//2]

# extract variable names and weights for the bar chart
variables = [pair[0] for pair in positive_weights]
weights = [pair[1] for pair in positive_weights]

# create horizontal bar chart
fig, ax = plt.subplots()
ax.barh(variables, weights)
ax.set_xlabel('Weight')
ax.set_ylabel('Variable Name')
ax.set_title('Positive Weights for Second Principal Component (First Half)')

plt.show()
In [93]:
import matplotlib.pyplot as plt

# extract the component weights for the second component
weights = pca.components_[1]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by absolute value of the weight
weight_pairs.sort(key=lambda x: abs(x[1]), reverse=True)

# extract the first half of negative weights
negative_weights = [pair for pair in weight_pairs if pair[1] < 0][:len(weight_pairs)//2]

# extract variable names and weights for the bar chart
variables = [pair[0] for pair in negative_weights]
weights = [pair[1] for pair in negative_weights]

# create horizontal bar chart
fig, ax = plt.subplots()
ax.barh(variables, weights)
ax.set_xlabel('Weight')
ax.set_ylabel('Variable Name')
ax.set_title('Negative Weights for Second Principal Component (First Half)')

plt.show()
In [94]:
import matplotlib.pyplot as plt

# extract the component weights for the second component
weights = pca.components_[1]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by absolute value of the weight
weight_pairs.sort(key=lambda x: abs(x[1]), reverse=True)

# extract the first twenty negative weights
negative_weights = [pair for pair in weight_pairs if pair[1] < 0][:20]

# extract variable names and weights for the bar chart
variables = [pair[0] for pair in negative_weights]
weights = [pair[1] for pair in negative_weights]

# create horizontal bar chart
fig, ax = plt.subplots()
ax.barh(variables, weights)
ax.set_xlabel('Weight')
ax.set_ylabel('Variable Name')
ax.set_title('Negative Weights for Second Principal Component (First Twenty)')

plt.show()
In [95]:
# extract the component weights for the first component
weights = pca.components_[0]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by weight (in descending order)
weight_pairs.sort(key=lambda x: x[1], reverse=True)

# extract the variable name with the highest positive weight
most_positive_var = weight_pairs[0][0]

print(f"The variable with the highest positive weight for the first principal component is {most_positive_var}.")
The variable with the highest positive weight for the first principal component is Health expenditure per capita, PPP.
In [96]:
import matplotlib.pyplot as plt

# extract the component weights for the first component
weights = pca.components_[0]

# create a list of weights and their corresponding variable names
weight_pairs = list(zip(new_df_imputed.columns, weights))

# sort the list by weight (in descending order)
weight_pairs.sort(key=lambda x: x[1], reverse=True)

# extract the names and weights of the top five variables with highest positive weights
top_positives = weight_pairs[:5][::-1]  # [::-1] reverses the order of the list for plotting

# create a bar chart of the top five variables
plt.barh(range(len(top_positives)), [pair[1] for pair in top_positives], align='center')
plt.yticks(range(len(top_positives)), [pair[0] for pair in top_positives])
plt.xlabel("Weight")
plt.title("Top Five Variables with Highest Positive Weights for Component 1")
plt.show()
In [97]:
import matplotlib.pyplot as plt

# extract the second principal component from the PCA object
second_component = pca.transform(new_df_imputed)[:, 1]

# plot the second component against the index of the dataframe
plt.plot(new_df_imputed.index, second_component)
plt.xlabel("Time")
plt.ylabel("Component 2 Score")
plt.title("Second Principal Component over Time")
plt.show()
In [100]:
import matplotlib.pyplot as plt

# extract the first principal component from the PCA object
first_component = pca.transform(new_df_imputed)[:, 0]

# extract the variable that has the highest weight in the first component
variable_name = pca.components_[0].argmax()
my_var = new_df_imputed.iloc[:, variable_name]

# plot the variable against the index of the dataframe
plt.plot(new_df_imputed.index, my_var)
plt.xlabel("Time")
plt.ylabel("Variable Value")
plt.title("Helth expenditure over time")
plt.show()
In [99]:
print("Explained Variance Ratio:")
print(pca.explained_variance_ratio_)
Explained Variance Ratio:
[9.99215532e-01 2.75163168e-04]

Part Two: History¶

How did we get to the point that we could generate a report¶

This is al the rest of the code and how I started and cleaned the dataset etc.

Shared Prosperity Index Datast Analysis Project¶

Step One : Cleaning¶

The first thing I did in assessing the dataseries was to start checking them against the plots on the sharedprosperity website. This was taking a while so I started trying to do it programatically, and then things escalated.

It probably would have been faster and easier to just clean the data in place using a variety of techniques but instead I decided to replace it wholesale with data that wasn't mangled.

To do this I went to each of the "see all indicators" pages on the website and used a chrome extension to batch download the images.

I then selected the content of the table of indicator plots and copied it into a text document, which didn't preserve the images but kept the titles as a list. This is because I noticed the urls of the images, when you removed the .jpg suffix, went to an interactive plotly page where you could retrieve the data.

The first time I tried constructing an index of indicator names to codes I did it manually, typing names into a spreadsheet and then right clicking on images and copying addresses. When I created the completed dataframe from this it had obvious errors so after using one of my extension days I had to go back to square one and do things more programatically.

I saved the images into subfolders named after the category and ordered them by their creation date, then created lists of the filenames to get the urls for the dataseries on plotly. They were downloaded in order from each page so the ordering matched the order of the list of titles.

I took the three name versions for each indicator in the data documentation and created a single csv with all of these, correcting one frameshift error that was in the documentation. I also noticed that the graph titles were similar to the "Indicator name (alternate)" column and used fuzzy logic to match these.

I used this to rename columns. I also created a csv which had the four digit codes for the urls next to the inidicator names from the graph titles and set up a plotly account so I could use the API to get the dataseries and create my own dataframe.

Some of the code is necessary maybe¶

but mostly not until further notice in at least like¶

second heading;¶

it's mostly used in generating and testing¶

the actual dataframe we're going to end up using¶
In [15]:
#Ok so we're pretty much starting from scratch. First thing is to bring in the dataset. need pandas
import pandas as pd
df = pd.read_csv("Assignment 1/final_dataset/final_dataset/shared_prosperity_assignment_dataset_mangled.csv")
df=df.sort_values(by='year')
In [2]:
#ok so we've got the dataset imported. Next we want the actual data that's not, you know, mangled. 
#To do that we scrape the plotly. We need the list of indicators. We'll get the list of different names 
#and maybe rename the column names in df 
colmap = pd.read_csv('Default Dataset - Sheet4 (1).csv')

ok yeah need this one¶

In [16]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

Ok not this next one and probably not much for a bit¶

This one that you see here is for generating the index of images/indicator codes. It doesn't work perfectly and leaves some of the traces wrong and we had to fix some manually but it's pretty good i mean it's better than my first attempt that was pretty much entirely manual

In [64]:
import os
from pathlib import Path

# Specify the path to the parent directory containing the subfolders
parent_dir = Path(r'C:\Users\02065797\Downloads\scrapedimgs')

# Create a list of all subdirectories in the parent directory
subfolders = [subfolder for subfolder in parent_dir.iterdir() if subfolder.is_dir()]

# Iterate over each subfolder
for subfolder in subfolders:
    print(os.path.basename(subfolder))
    # Get a list of all files in the subfolder, sorted by creation time
    files = sorted(os.listdir(subfolder), key=lambda f: os.stat(os.path.join(subfolder, f)).st_ctime)

    # Strip the .jpg suffix from each filename
    filenames = [os.path.splitext(file)[0] for file in files]

    # Print the list of filenames for the current subfolder
    print(filenames)
education
['993', '1065', '1067', '995', '997', '1069', '1071', '1073', '1075', '1078', '1080', '1082', '1084', '1086']
employment
['1033', '1164', '1166', '1035', '1037', '1039', '1041', '1043', '1168', '1170', '1045', '1047', '1049']
general
['1000', '1088', '1090', '1092', '1094', '1096', '1098', '1100']
health
['1003', '1005', '1102', '1104', '1106', '1109', '1111', '1113', '1115', '1117', '1119', '1122', '1124', '1126', '1128', '1130', '1132', '1134', '1136', '1138', '1007', '1009']
housing
['1146', '1016', '1140', '1270', '1148', '1012']
incomewealth
['1018', '1020', '1022', '1024', '1026', '1028', '1030', '1152', '1154', '1156', '1158', '1160', '1162']
safety
['1057', '1059', '1061', '1063', '1214']
socioec
['1172', '1174', '1176', '1180', '1184', '1200', '1052', '1178', '1188', '1190', '1186', '1182', '1209', '1194', '1206', '1202', '1204', '1212', '1198', '1192', '1196', '1054']
In [4]:
colnames=['Indicator','Code']
scrapedcodes=pd.read_csv('indicator codes - sheet2 (3).csv',header=None,names=colnames)
In [79]:
scrapedcodes; 
#I took the output from the filename printouts and copied and pasted them into a google doc
#I used gogole docs to strip quotes and commas and square brackets then pasted
#into a google sheet next to the  indicator names, then used the split into columns function and 
#transposedthe series to line the codes up next to the indicators.

Scraping¶

Okay so the following section is for scraping the values from plotly; we use the index, we construct a url using the code, we put our credentials in and ask plotly for the trace which gives the first one which is the interpolated it looks like, that's probably fine in terms of how much we've improved the data compared to how much coding we have to do

In [5]:
#okay so that looks like the best attempt yet, the number of columns matches up, hopefully the mroe programmatic
#approach has helped sort it out. next wee need to set up plotly and scrape the data from tsusnjak so
!pip install chart_studio
Collecting chart_studio
  Using cached chart_studio-1.1.0-py3-none-any.whl (64 kB)
Requirement already satisfied: requests in c:\programdata\anaconda3\lib\site-packages (from chart_studio) (2.28.1)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from chart_studio) (1.16.0)
Requirement already satisfied: plotly in c:\programdata\anaconda3\lib\site-packages (from chart_studio) (5.9.0)
Collecting retrying>=1.3.3
  Using cached retrying-1.3.4-py3-none-any.whl (11 kB)
Requirement already satisfied: tenacity>=6.2.0 in c:\programdata\anaconda3\lib\site-packages (from plotly->chart_studio) (8.0.1)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests->chart_studio) (2022.9.14)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests->chart_studio) (1.26.11)
Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests->chart_studio) (3.3)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\programdata\anaconda3\lib\site-packages (from requests->chart_studio) (2.0.4)
Installing collected packages: retrying, chart_studio
Successfully installed chart_studio-1.1.0 retrying-1.3.4
In [6]:
import chart_studio.plotly as py

import chart_studio.tools as tls

tls.set_credentials_file(username='tobiasnash', api_key='YXhqKgaipJ77lcF0iZdn')

The next cell scrapes the ~tsusnjak plotly account for the y values of plots with filenames matching the codes in our indicator code index. This shoud not be executed except for testing as we have saved the resulting dataframe to a csv and now just read_csv to get it.

In [151]:
import requests

#create an empty dictionary to hold the data for each indicator
data_dict = {}

#loop over the dataframe
for index, row in scrapedcodes.iterrows():
    #construct the url for the code
    code = row['Code']
    url = f"https://plotly.com/~tsusnjak/{code}"
    
    #retrieve the json data
    figure = py.get_figure(url)
    data = figure['data']
    
    # extract the y-values from the json data
    y_values = data[0]['y']
            
    # store the y-values in the dictionary for the corresponding indicator
    indicator = row['Indicator']
    data_dict[indicator] = y_values
    
#create a new dataframe using the dictionary
new_df = pd.DataFrame(data_dict)
In [152]:
new_df.to_csv(r'programaticscrapeddata.csv') #for safekeeping, can read_csv next time instead of scraping every runtime

Need this if you haven't done a save yet like this is your first run through and you're going to try and do it yourself

In [7]:
new_df=pd.read_csv('programaticscrapeddata.csv') #don't wanna keep starting from scratch
In [8]:
new_df.index=range(1982,2019) #change index to year
In [9]:
df=df.drop(columns=['year']) #get rid of year column and then:
In [10]:
df.index=range(1982,2019) #changing index to year
In [91]:
df; #just checking but supress, leave it otherwise slow af
In [11]:
#need to rename columns in og df
#check colmap
colmap.columns
Out[11]:
Index(['Column name in dataset', 'Indicator name',
       'Indicator name (alternative)'],
      dtype='object')
In [12]:
oldnames=colmap['Column name in dataset']
newnames=colmap['Indicator name (alternative)']
In [13]:
df.rename(columns=dict(zip(oldnames,newnames)), inplace=True)
In [188]:
df;
In [14]:
len(df.columns)
Out[14]:
102
In [15]:
colstodrop=set(new_df.columns)-set(df.columns)
In [16]:
colstodrop; #looks like we have case differences, will fuzzywuzzy new_df
In [17]:
!pip install fuzzywuzzy
from fuzzywuzzy import process
Collecting fuzzywuzzy
  Using cached fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0
C:\ProgramData\Anaconda3\lib\site-packages\fuzzywuzzy\fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
In [18]:
dfcols=df.columns
In [19]:
newdfcols=new_df.columns
In [20]:
matchednames =[]

for name in newdfcols:
    matches = process.extractBests(name, colmap['Indicator name (alternative)'], score_cutoff=90)

    if matches:
        matched_name = matches[0][0]
        mn_dict = {"name": name, "match": matched_name}
        matchednames.append(mn_dict)
        
    
        
    elif name == "Household Median Multiples":   #fuzzywuzzy can't match this one
        matched_name = "Median multiple for housing"
        mn_dict = {"name": name, "match": matched_name}
        matchednames.append(mn_dict)
        
    elif name == "Gender Earnings Gap": #or this one I think
        matched_name = "Gender pay gap"
        mn_dict = {"name": name, "match": matched_name}
        matchednames.append(mn_dict)
    else:
        print(name)
Unnamed: 0
Inadequacy Of Income, Gender
Inadequacy Of Income, Housing Tenure
Inadequacy Of Income, Long-Term Migrant
Inadequacy Of Income, Māori
Low Income, Gender
In [21]:
mndf= pd.DataFrame(matchednames)
len(matchednames)
Out[21]:
102
In [123]:
dfcols;
In [124]:
newdfcols;
In [22]:
sum(mndf['match'].duplicated()) #make sure it hasn't matched many to one
Out[22]:
0
In [23]:
newdfoldnames = mndf['name'].tolist()
In [24]:
newdfnewnames = mndf['match'].tolist()
In [25]:
new_df.rename(columns=dict(zip(newdfoldnames,newdfnewnames)), inplace=True)
In [26]:
colstodrop2=set(new_df.columns)-set(df.columns)
In [27]:
colstodrop2
Out[27]:
{'Inadequacy Of Income, Gender',
 'Inadequacy Of Income, Housing Tenure',
 'Inadequacy Of Income, Long-Term Migrant',
 'Inadequacy Of Income, Māori',
 'Low Income, Gender',
 'Unnamed: 0'}

That gender wage gap is a problem, that should be in both. let's see. (fixed now); the rest are indicators from sharedprosperit that don't appear in the provided dataset

In [28]:
#have tried modifying the fuzzywuzzy code above but risk mangling things further, 
#will try just rename it in place just the one column
new_df.rename(columns={'Gender Wage Gap':'Gender pay gap'},inplace=True)
In [29]:
testdf=new_df.drop(columns=colstodrop2)
In [30]:
sum(testdf.columns.duplicated())
Out[30]:
0
In [31]:
sum(df.columns.duplicated())
Out[31]:
0
In [32]:
set(testdf.columns)-set(df.columns), set(df.columns)-set(testdf.columns) #checking to see if theres any difference
Out[32]:
(set(), set())
In [33]:
#returns empty set so why is compare() not working?
In [34]:
set(df.columns) == set(testdf.columns)
Out[34]:
True
In [35]:
df=df.apply(pd.to_numeric,errors='coerce') #because maybe its the datatype stopping it from comparing
In [36]:
df=df.astype(float)
In [37]:
df.columns.values, testdf.columns.values; #look the same; will try reindexing
In [38]:
cols = df.columns.tolist()
testdf = testdf.reindex(columns=cols) #ordering them the same to make it work
In [39]:
compdf=df.compare(testdf) #it worked
In [143]:
#One thing that's going on that I can't explain atm is that for the D10:D1-4 data it's dropped the first value and 
#returned NaN whereas when I look at the data on plotly directly it's got a value of 0.91 for 1982. I'm going to 
#fill in some of these values manually I think; it mostly looks right but there are a few gaps. Maybe there were 
#errors with calling the plotly data. Another thing I haven't considered is that I think
#I called the first y trace, the interpolated values rather than the actual values, which is maybe better but then
#it's strange that there are still gaps. Given that I'm supposed to be doing interpolation as an exercise it's probably
#good in one sense to get the actual values and work on that but then part of the point of doing this was to
#complete a bunch of steps in one fell swoop and also teach myself some scraping skills and put them into practice.
#Given the time constraints I think I'll keep the interpolated data but I need to keep going through it and checking
#So for now I think what I'm gonna do is focus on these income and wealth indicators because these seem ripe for interpolation 
#and also for searching for correlations and interesting relationships with other factors and it's expected
#I think that we use a limited subset of the data after cleaning because the datset is so rich it wouldn't
#necessarily be practical to analyse the entire set in this one assignment
compdf;

Ok it seems much better. I still need to have a look through, visual inspection of the graphs for the indicators on the website to see if my values match up better than the old ones but this seems promising. Now that I've spent three days figuring out how to do this I can try and get the rest of this finished.

Step Two: Exploratory Data Analysis¶

One thing I noticed is that the loan delinquencies data has negative values. That doesn't seem right given that percentages generally aren't negative at least in the context of "How many loans are delinquent out of all loans?" I thought maybe it was people paying off loans that had been delinquent because of boom times in the early nineties or something. Anyway I tried to find the original series that was supposedly from rbnz but there was nothing there going back to 1992 and the only data they had about malperforming loans wasn't just personal loans which this data seems to be. I emailed and had a phone conversation with a senior data analyst at rbnz and they said it looks like this dataseries includes data from another source.

In [54]:
new_df['D10:D1-4 income share (Palma)'];

Ok actually that looks perfect. I'll try the same column from testdf

In [55]:
testdf['D10:D1-4 income share (Palma)'];

Seems the problem is only coming up in the compare function. Bothe new_df and testdf show the first value for 1982, it only shows up as NaN in the compare function.

In [56]:
print(df.equals(testdf))
False
In [57]:
#false; obviously there's huge differences.
In [40]:
goodnames = colmap['Indicator name']
badnames = colmap['Indicator name (alternative)']

new_df.rename(columns=dict(zip(badnames,goodnames)), inplace=True)
# replace "/" with "_"
new_df.columns = new_df.columns.str.replace("/", "_")
#replace : with to
new_df.columns = new_df.columns.str.replace(":", " to ")
In [85]:
new_df;
In [167]:
#gonna make scatter plots of the columns
#getting an error need to maybe reindex
#the error i think comes from the filenams; I think I'll rename the columns in this one to just 
#indicator name which hopefully doesn't have forbidden characters


import matplotlib.pyplot as plt
import os

# create a directory to store the plots
os.makedirs("scatterplots", exist_ok=True)

# loop over each column in the dataframe
for col in new_df.columns:
    # create a scatter plot
    plt.scatter(new_df.index, new_df[col])
    plt.xlabel("Index")
    plt.ylabel(col)
    plt.title(f"Scatterplot of {col}")
    
    # save the plot to a file
    plt.savefig(f"scatterplots/{col}.png")
    
    # clear the plot for the next iteration
    plt.clf()
<Figure size 640x480 with 0 Axes>
In [91]:
new_df.columns.values;
In [41]:
new_df.drop(columns='Unnamed to  0',inplace=True)
In [53]:
#idk i got a bunch of scatterplots and i wnat to compare them to the ones i downloaded from sharedprosperitt but now like 
#those ones i downloaded are all saved as just the code numbers so i could programatically comper
#them but that would be relying ony my own index i created again so it wouldn't prove anything
#so what i might do is just compare them to the images on the website but it loads slow af because of all
#the fancy css so maybe I'll just screenshot the pages for quicker comparison later

I visually inspected the scatter plots and compared them to the screenshots from sharedprosperity.co.nz; they all matched although for the first two iterations there were problems with the housing category and the socioeconomic indicators; the number of indicators and codes was the same but the ordering was wrong and there was some duplication. I reordered the index I was using and just scraped all your plots all over again, deleted the scatterplots, remade them and then did it again once I came across the second set of errors. There is obviously a wide range of more sophisticated methods which wouldn't use anywhere near as much machine time but would take me longer to figure out.

We are now in a position to do actual analysis. The entire dataset is in new_df, plus the general inequality indicators which were previously left out. It's all as good as your own interpolation work. My plan is to use the income and wealth indicators as causal variables and test them against various other indicators and see how these things have changed over time because we have limited time left. I may try and include some more data and especially if I get time and more information to clean up the loan delinquency percentage data or determine why it's not looking like what I'd expect.

We also probably need to do some actual data wrangling rather than just stealing your figure interpolated data wholesale. There are still some gaps; at least the gaps are now NaN and able to be processed.